Up and Running With Fast.ai and Docker
Last updated
Last updated
Last Monday marked the start of the latest series of Fast.ai courses: Cutting Edge Deep Learning For Coders. If you have an interest in data science and haven’t heard of Fast.ai, you should check them out. Fast.ai is a community started by Jeremy Howard and Rachael Thomas in 2016. It now includes an impressive set of courses and a machine learning library by the same name. What sets them apart is their practical no nonsense approach to solving data science problems by example. In this post, I’m sharing two Fast.ai docker files which provide a data science environment based on the Fast.ai library, as well some tips for getting up and running with docker quickly.
Docker provides a software layer that sits above the operating system to support containerization. Virtual machines have been around for years but docker is more lightweight. With the development of Nvidia-Docker, GPU support is baked into the docker environment.
Three reasons you might want to use docker for data science are:
Plug and Play: Once you’ve installed the Nvidia-Docker server on your host machine, you run a docker image to create a container. There are thousands of docker images pre-build by companies like Nvidia with software like CUDA already installed. Once you have the image you need, you’re ready to go.
Easy Configuration: Your docker image is based on a docker file which is a script for building the image. Adding or removing software is as easy as modifying the docker file and rebuilding the image. From the docker file you can also import an existing image and add to it using the FROM command (see the Docker Filessection below for details)
Containerization: Because the docker container separates your operational software environment from the host operating system, you get the benefits of containerization. There are lots of benefits to containerization, but the one I really like is the capability to manage potential software conflicts or dependency issues at the container level without permanently changing the operating system. If things get ugly you can erase the docker image as if it never existed.
Firstly, Docker containers need root privileges to run, which may pose security issues in some corporate settings.
Secondly, Docker doesn’t run natively on Windows or OSX. To use docker with these platforms you need to introduce an additional layer of virtualization, such as Docker For Windows or Docker for Mac. According to this article performance is improving, but still lags bare metal Linux installations. However nvidia-docker is still not supported on Windows and Mac. Given the importance of a GPU for deep learning, Docker probably doesn’t make sense if you want to use Windows or Mac. The docker files have been developed for systems with a GPU supported by nvidia-docker.
The homepage for Nvidia-Docker provides a useful starting point for installing the software on your host machine. There are also many tutorials online providing guides for the various flavors of linux and other operating systems. Nvidia-Docker is an additional software package that supplements the core docker installation. Its job is to interface with the Nvidia drivers on the host which control the GPU hardware. The Nvidia driver version on the host machine will determine the version of CUDA you can run in the container. Once you know the driver version you have on the host, you can check the compatible CUDA version here.
The repo contains two docker files
fastai.latest.cuda8
fastai.latest.cuda9
You can access the repo here. The docker files support nvidia-docker for versions 8 and 9 of cuda respectively. Both images inherit from an ubuntu16.04 image with CUDA. The differences between the two files are:
The cuda version installed on the OS, being 8 and 9 respectively.
The python package to interface with CUDA being cuda80 and cuda90 respectively. The Fastai python environment is created from the environment.yml file included in the Fastai github repository. To support cuda 8, we replace the cuda90 python package provided in environment.yml with the cuda80 package:
When run, the docker container automatically starts the Jupyter Notebook server with a default password: fastai.
If you need to change the password, refer to the section of the docker file titled Start Up :
run the code below in a Jupyter Notebook to generate a new password keyfrom notebook.auth import passwd; passwd()
use the key generated above to update the NotebookApp.password attribute in the docker fileNotebookApp.password='sha1:a60ff295d0b9:506732d050d4f50bfac9b6d6f37ea6b86348f4ed'"
. rebuild the docker image. (refer to the Docker Quickstart Commands section below)
You’ll find a full command line reference at docker.com. Don’t be overwhelmed by the number of commands. Chances are you’ll only need to remember a handful of them.
docker build: Builds an image from a docker file.
Example
docker run: Runs a container (created from an image if it doesn’t yet exist).
Example
docker exec Run a command in a running docker container.
Example
docker images: List docker images
docker ps: List running docker containers
docker system df: Show docker disk usage
docker system prune: Purges any stopped containers, the build cache and dangling images. A dangling image occurs when you rebuild an image without assigning a new name. The old version is kept and continues to take up disk space.
In the course of experimentation you will probably discover the need for additional python package or software tools. Here are two simple examples of how you can modify the docker file to include new python packages.
Using Conda
Install From Source
To get started there are 5 steps you need follow:
Install Nvidia-Docker on your host machine.
Download the docker file you need
Build the docker file
Run the docker image
In your web browser, navigate to localhost:8888 to access the Jupyter environment