Updating packages in docker containers with external volumes

Working with Docker containers allows for a lot of flexibility in deployment but can pose several challenges. In a recent consulting experience with SamurAI, I have faced the need to update several docker containers based on JupyterLab hosted on a server protected by a strong corporate firewall. Since I couldn’t find any comprehensive documentation for my use case and I had to come up with a custom solution, I will briefly share my experience here, hoping that somebody will find it useful.

The basic example

Let’s assume that the base image used to deploy JupyerLab containers is jupyter/datascience-notebook. With this image is possible to create a container for each user (let’s call them user1, user2, …) with a command like:

docker run -d —-name datascience-user1 -p 8888:8888 -e JUPYTER_ENABLE_LAB=yes -v /path/to/persist/data/user1:/home/jovyan jupyter/datascience-notebook
where -d will launch in detach mode, -p will expose the needed port (which will need to be different for each container), --name will assign a specific name to the container, -e will load an environment variable ( JUPYTER_ENABLE_LAB=yes will load by default the JupyterLab interface instead of the simple Notebook), and -v will mount an external volume (/path/to/persist/data/user1) instead of the container’s home directory ( /home/jovyan) to retain persistent data when deleting the container.
Also, having a look at the container’s logs, we can see that the packages are stored in the /opt/conda directory. The log also contains the token that we can provide to the user1 for login. Let’s assume that we have deployed several containers with the same image. What happens if at some point we want to add/update a package? This will require to delete all the existing containers and re-deploy new ones with an updated image. Can we do better?

Store common packages into external volumes

Let’s deploy a “service” container with a command similar to the one shown above, just without the mounted volume for the home directory:

docker run -d —-name datascience-service -p 8888:8888 -e JUPYTER_ENABLE_LAB=yes jupyter/datascience-notebook
We have observed that the packages used by Jupyter are stored under /opt/conda. We can copy this container’s internal directory hosting the packages (common to all the containers) to a server’s local folder.

docker cp datascience-user1:/opt/conda /path/to/persist/data/opt/conda
At this point we can deploy the user containers making use of the newly created folder, attached as external volume:

docker rm datascience-user1 -f # (only if needed)

docker run -d —-name datascience-user1 -p 8888:8888 -e JUPYTER_ENABLE_LAB=yes -v /path/to/persist/data/user1:/home/jovyan -v /path/to/persist/data/opt/conda:/opt/conda jupyter/datascience-notebook
The new container datascience-user1 will not only have persistent data for the user data, but also the conda packages, which can now be shared among all the users. Why is this useful?

Updating packages for many containers

Let’s assume we want to add a package not existing in the jupyter/datascience-notebook docker image, for instance hdbscan. Let’s note two things:
  • The containers deployed with the -v /path/to/persist/data/opt/conda:/opt/conda option now use the same local server directory to source their conda packages. This means that by updating the packages therein, all the containers will see the new packages.
  • Since the new packages are outside of the containers, no re-deployment will be needed, a simple restart will be sufficient!
How?

Updating one container to update them all

Let’s recreate the service container with the shared volume and the option :z to let it modify the external files:

docker run -d —-name datascience-service -p 8888:8888 -e JUPYTER_ENABLE_LAB=yes -v /path/to/persist/data/opt/conda:/opt/conda:z jupyter/datascience-notebook
and let’s access it with an interactive shell:

docker exec -it datascience-service bash
At this point we can update the desired package, for instance:

conda install -c conda-forge hdbscan
And this will store the new package under /path/to/persist/data/opt/conda. To make hdbscan visible to all the other containers, a simple restart will be sufficient.

docker restart $(docker ps | grep datascience-user | awk '{print $1}')

What if the server is protected by a strong corporate firewall?

If the server can’t access the repositories hosting the desired packages, an additional step can be performed to make it work. First, it will be needed to ship (via scp) to a computer connected to the internet:
  • the image used to deploy the containers, in this case jupyter/datascience-notebook
  • (a tarball of) the external folder used to contain the common packages, in this case /path/to/persist/data/opt/conda
Then, from the same computer it will be needed to deploy a container mapping the internal folder /opt/conda to the (untarred) new persistent folder /new/path/to/persist/data/opt/conda. At this point it will be possible to update the desired package(s) from inside the service container. Finally, the (tarball) version of the /new/path/to/persist/data/opt/conda will be shipped back to the server, replacing the /path/to/persist/data/opt/conda (make a backup first!). As described before, a restart of the containers will make the new package(s) visible.
Read this blog post and more on medium