Binder allows you to create custom computing environments that can be shared and used by many remote users. You can use many open source languages, Configure the user interface, and more. A Binder service is powered by BinderHub Repo on Github, an open-source tool that runs on Kubernetes, a portable, extensible, open-source platform for managing containerized services. One such deployment lives at Binder Home, and is free to use.
repo2docker CLI looks for configuration files in the repository being built to determine how to build it. In general, repo2docker uses the same configuration files as other software installation tools, rather than creating new custom configuration files. A number of repo2docker configuration files can be combined to compose more complex setups.
The binder examples organization on GitHub contains a list of sample repositories for common configurations that repo2docker can build with various configuration files such as Python and R installation in a repository.
A list of supported configuration files (roughly in the order of build priority) can be found below.
The environment file
environment.yml configuration is the standard configuration file used by Conda that lets you install any kind of package, including Python, R, and C/C++ packages. The
repo2docker does not use your
environment.yml to create and activate a new conda environment. Rather, it updates a base conda environment defined here with the packages listed in your environment file. This means that the environment will always have the same default name, not the name specified in your environment file.
dependencies: - ipywidgets - matplotlib - numpy - scipy - pandas - seaborn - statsmodels
If you use a environment file, then Binder will use a Miniconda distribution to install your packages. However, you may still want to use pip. In this case, you should not use a
requirements.txt file, but instead use a pip section in
environment.yml. This repository is an example of how to construct your configuration to accomplish this.
channels: - conda-forge - defaults dependencies: - python - numpy - pip - pip: - nbgitpuller - sphinx-gallery - pandas - matplotlib
The conda command searches a set of channels. By default, packages are automatically downloaded and updated from the default channel which may require a paid license, as described in the repository terms of service a commercial license.
The requirements file
requirements.txt specifies a list of Python packages that should be installed in your environment. The
requirements.txt example J1 is using on GitHub shows a typical setup used with J1 Theme for the example notebooks to run.
For Binder, a requirements file should list all Python libraries that your notebooks depend on, and they will be installed under the hood using:
pip install -r requirements.txt
The base Binder image contains no extra dependencies, so be as explicit as possible in defining the packages that you need. This includes specifying explicit versions wherever possible.
|If you do specify strict versions, it is important to do so for all your dependencies, not just direct dependencies. Strictly specifying only some dependencies is a recipe for environments breaking over time.|
You can also specify which Python version to install in your built environment with
environment.yml. By default, repo2docker installs default_python (Python 3.7) with your
environment.yml unless you include the version of Python in this file. The package manager conda supports all versions of Python, though repo2docker support is best with Python
| If you include a Python version in a |
The apt file
Contrasting the requirements or the environment file to install Python-specific modules, the file
apt.txt is used to install OS packages like
ffmeg to make them usable in Jupyter Notebooks. The video library is a good example of why OS packages are needed.
The video library
ffmeg comes in two flavors:
the OS packages to install the application binaries of ffmeg
the Python/Jupyter package for ffmeg to integrate the video library in Jupyter Notebooks
The Python/Jupyter package for ffmeg is just a wrapper around the OS package of
ffmeg to make the library usable for Jupyter. A prerequisite to using the wrapper is an installation
ffmeg for the OS, for the resulting Docker image.
The base OS used for the Docker images created by Binder is the Ubuntu, a free Linux distribution. As the name
apt.txt of the file implies, the configuration goes with the
apt, the command to install new software packages for Ubuntu.
The apt file is a simple text file that specifies OS packages. The format is simple: use one package to be installed per line.
|The apt command is a powerful command line tool that works with Ubuntu’s Advanced Packaging Tool (APT) performs such functions as installation of new software packages, upgrade of existing software packages, and even upgrading the entire Ubuntu system.|
If underlying libraries or application packages are needed, the apt file is the right place to install those OS-specific dependencies.
The runtime file
Sometimes you want to specify the version of the runtime, the version of Python, but the environment specification format will not let you specify this information. For these cases, the special file
runtime.txt can be used.
| runtime.txt is only supported when used with environment specifications that do not already support specifying the runtime. If you are using a |
Configure the user interface
You can build several user interfaces into the resulting Docker image. This is controlled with various Configuration Files.
Jupyter Lab Interface
JupyterLab is the default interface for repo2docker. The following Binder URL will open the jekyll-one repository and begin a JupyterLab session opening the path
The filepath notebooks above is how JupyterLab directs you to a specific file or folder. To learn more about URLs in JupyterLab and Jupyter Notebook, visit Starting JupyterLab.
Classic Notebook Interface
The classic notebook is also available without any configuration. To switch to the classic notebook interface, you do not need any extra configuration in order to allow the use of the classic notebook interface. You can launch the classic notebook interface from within a user session by opening JupyterLab and replacing the path
/tree/ in the JuptyerLab URL like so:
Track repository data on Binder
The mybinder.org team runs a service that provides repository-level data about all of the binders that run each day. This is called the mybinder.org event analytics archive. You can use this to track how often people are clicking your Binder links and launching your Binder repository (or, for aggregating activity across many repositories).
Access the event analytics archive
You can access the event analytics archive at
archive.analytics.mybinder.org. For information about the structure of this dataset, and a description of how you can read-in the data in order to analyze it, see the Binder Site Reliability Guide (SRE). instructions.
Example repository to show off analyses
To give you a little inspiration, check out the binderlyzer binder. This is a Binder that goes through a simple analysis of Binder repositories using the events archive. It shows how to access it, and gives an idea for questions you can ask with this data!
If you do something interesting or fun with the event analytics archive, please let us know! We provide this resource in the hopes that it gives people insight into the activity going on in Binder land, and would love to hear about anything interesting you find.
Binder API Reference
BinderHub connects several services together to provide on-the-fly creation and registry of Docker images. It utilizes the following tools:
A cloud provider such Google Cloud, Microsoft Azure, Amazon EC2, and others
Kubernetes to manage resources on the cloud
Helm to configure and control Kubernetes
Docker to use containers that standardize computing environments
A BinderHub UI that users can access to specify Git repos they want built repo2docker to generate Docker images using the URL of a Git repository
A Docker registry (such as gcr.io) that hosts container images
JupyterHub to deploy temporary containers for users
After a user clicks a Binder link, the following chain of events happens:
BinderHub resolves the link to the repository.
BinderHub determines whether a Docker image already exists for the repository at the latest ref (git commit hash, branch, or tag).
If the image doesn’t exist, BinderHub creates a build pod that uses repo2docker to do the following:
Fetch the repository associated with the link
Build a Docker container image containing the environment specified in configuration files in the repository.
Push that image to a Docker registry, and send the registry information to the BinderHub for future reference.
BinderHub sends the Docker image registry to JupyterHub.
JupyterHub creates a Kubernetes pod for the user that serves the built Docker image for the repository.
JupyterHub monitors the user’s pod for activity, and destroys it after a short period of inactivity.
|Pods (smallest compute unit that can be defined, deployed, and managed in Kubernetes) are the rough equivalent of a machine instance (physical or virtual) to a container. Each pod is allocated its own internal IP address, therefore owning its entire port space, and containers within pods can share their local storage and networking.|
There’s one API endpoint, which is:
Even though it says build it actually performs launch.
provider_prefix identifies the provider
spec defines the source of the computing environment to be built and served using the given provider.
|The provider_prefix can be any of the supported repository providers in BinderHub, see the Repository Providers section for supported inputs.|
To use this endpoint, construct an appropriate URL and send a request. You’ll get back an Event Stream. It’s pretty much just a long-lived HTTP connection with a well known JSON based data protocol. It’s one-way communication only (server to client) and is straightforward to implement across multiple languages.
When the request is received, the following happens:
Check if this image exists in our cached image registry. If so, launch it.
If it doesn’t exist in the image registry, we check if a build is currently running. If it is, we attach to it and start streaming logs from it to the user.
If there is no build in progress, we start a build and start streaming logs from it to the user.
If the build succeeds, we contact the JupyterHub API and start launching the server.
This section catalogs the different events you might receive.
| || |
Emitted whenever a build or launch fails. You must close your EventStream when you receive this event.
| || |
Emitted after the image has been built, before launching begins. This is emitted in the start if the image has been found in the cache registry, or after build completes successfully if we had to do a build.
Note that clients shouldn’t rely on the imageName field for anything specific. It should be considered an internal implementation detail.
| || |
Emitted when we started a build pod and are waiting for it to start.
| || |
Emitted during the actual building process. Direct stream of logs from the build pod from repo2docker, in the same form as logs from a normal docker build.
| || || |
Emitted when fetching the repository to be built from its source (GitHub, GitLab, wherever).
| || |
Emitted when the image is being pushed to the cache registry. This provides structured status info that could be in a progressbar. It’s structured similar to the output of docker push.
| || |
When the repo has been built, and we’re in the process of waiting for the hub to launch. This could end up succeeding and emitting a ready event or failing and emitting a failed event.
| || |
When your notebook is ready! You get a endpoint URL and a token used to access it. You can access the notebook|API by using the token in one of the ways the notebook accepts security tokens.
In EventSource, all lines beginning with
: are considered comments. We send a
:heartbeat every 30s to make sure that we can pass through proxies without our request being killed.
Repository Providers (or RepoProviders) are locations where repositories are stored (e.g., GitHub). BinderHub supports a number of providers out of the box, and can be extended to support new providers. For a complete listing of the provider classes, see table below.
| || |
GitHub is a website for hosting and sharing git repositories.
| || |
GitLab offers hosted as well as self-hosted git repositories.
| || |
Gists are small collections of files stored on GitHub. They behave like lightweight repositories.
| || |
Zenodo is a non-profit provider of scholarly artifacts (such as code repositories) run in partnership with CERN.
| || |
FigShare is a company that offers hosting for scholarly artifacts (such as code repositories).
| || |
HydroShare is a hydrologic information system for users to share and publish data and models.
| || |
Dataverse is open source research data repository software installed all over the world.
| || |
A generic repository provider for URLs that point directly to a git repository.
Configuration and Source Code Reference
Find details for all code references on: BinderHub Docs - Reference