scrapbook
  • "Unorganized" Notes
  • The Best Public Datasets for Machine Learning and Data Science
  • Practice Coding
  • plaid-API project
  • Biotech
    • Machine Learning vs. Deep Learning
  • Machine Learning for Computer Graphics
  • Books (on GitHub)
  • Ideas/Thoughts
  • Ziva for feature animation: Stylized simulation and machine learning-ready workflows
  • Tools
  • 🪶math
    • Papers
    • Math for ML (coursera)
      • Linear Algebra
        • Wk1
        • Wk2
        • Wk3
        • Wk4
        • Wk5
      • Multivariate Calculus
    • Improving your Algorithms & Data Structure Skills
    • Algorithms
    • Algorithms (MIT)
      • Lecture 1: Algorithmic Thinking, Peak Finding
    • Algorithms (khan academy)
      • Binary Search
      • Asymptotic notation
      • Sorting
      • Insertion sort
      • Recursion
      • Solve Hanoi recursively
      • Merge Sort
      • Representing graphs
      • The breadth-first search algorithm
      • Breadth First Search in JavaScript
      • Breadth-first vs Depth-first Tree Traversal in Javascript
    • Algorithms (udacity)
      • Social Network
    • Udacity
      • Linear Algebra Refresher /w Python
    • math-notes
      • functions
      • differential calculus
      • derivative
      • extras
      • Exponentials & logarithms
      • Trigonometry
    • Probability (MIT)
      • Unit 1
        • Probability Models and Axioms
        • Mathematical background: Sets; sequences, limits, and series; (un)countable sets.
    • Statistics and probability (khan academy)
      • Analyzing categorical data
      • Describing and comparing distributions
      • Outliers Definition
      • Mean Absolute Deviation (MAD)
      • Modeling data distribution
      • Exploring bivariate numerical data
      • Study Design
      • Probability
      • Counting, permutations, and combinations
      • Binomial variables
        • Binomial Distribution
        • Binomial mean and standard deviation formulas
        • Geometric random variable
      • Central Limit Theorem
      • Significance Tests (hypothesis testing)
    • Statistics (hackerrank)
      • Mean, Medium, Mode
      • Weighted Mean
      • Quartiles
      • Standard Deviation
      • Basic Probability
      • Conditional Probability
      • Permutations & Combinations
      • Binomial Distribution
      • Negative Binomial
      • Poisson Distribution
      • Normal Distribution
      • Central Limit Theorem
      • Important Concepts in Bayesian Statistics
  • 📽️PRODUCT
    • Product Strategy
    • Product Design
    • Product Development
    • Product Launch
  • 👨‍💻coding
    • of any interest
    • Maya API
      • Python API
    • Python
      • Understanding Class Inheritance in Python 3
      • 100+ Python challenging programming exercises
      • coding
      • Iterables vs. Iterators vs. Generators
      • Generator Expression
      • Stacks (LIFO) / Queues (FIFO)
      • What does -1 mean in numpy reshape?
      • Fold Left and Right in Python
      • Flatten a nested list of lists
      • Flatten a nested dictionary
      • Traverse A Tree
      • How to Implement Breadth-First Search
      • Breadth First Search
        • Level Order Tree Traversal
        • Breadth First Search or BFS for a Graph
        • BFS for Disconnected Graph
      • Trees and Tree Algorithms
      • Graph and its representations
      • Graph Data Structure Interview Questions
      • Graphs in Python
      • GitHub Repo's
    • Python in CG Production
    • GLSL/HLSL Shading programming
    • Deep Learning Specialization
      • Neural Networks and Deep Learning
      • Untitled
      • Untitled
      • Untitled
    • TensorFlow for AI, ML, and DL
      • Google ML Crash Course
      • TensorFlow C++ API
      • TensorFlow - coursera
      • Notes
      • An Introduction to different Types of Convolutions in Deep Learning
      • One by One [ 1 x 1 ] Convolution - counter-intuitively useful
      • SqueezeNet
      • Deep Compression
      • An Overview of ResNet and its Variants
      • Introducing capsule networks
      • What is a CapsNet or Capsule Network?
      • Xception
      • TensorFlow Eager
    • GitHub
      • Project README
    • Agile - User Stories
    • The Open-Source Data Science Masters
    • Coding Challenge Websites
    • Coding Interview
      • leetcode python
      • Data Structures
        • Arrays
        • Linked List
        • Hash Tables
        • Trees: Basic
        • Heaps, Stacks, Queues
        • Graphs
          • Shortest Path
      • Sorting & Searching
        • Depth-First Search & Breadth-First Search
        • Backtracking
        • Sorting
      • Dynamic Programming
        • Dynamic Programming: Basic
        • Dynamic Programming: Advanced
    • spaCy
    • Pandas
    • Python Packages
    • Julia
      • jupyter
    • macos
    • CPP
      • Debugging
      • Overview of memory management problems
      • What are lvalues and rvalues?
      • The Rule of Five
      • Concurrency
      • Avoiding Data Races
      • Mutex
      • The Monitor Object Pattern
      • Lambdas
      • Maya C++ API Programming Tips
      • How can I read and parse CSV files in C++?
      • Cpp NumPy
    • Advanced Machine Learning
      • Wk 1
      • Untitled
      • Untitled
      • Untitled
      • Untitled
  • data science
    • Resources
    • Tensorflow C++
    • Computerphile
      • Big Data
    • Google ML Crash Course
    • Kaggle
      • Data Versioning
      • The Basics of Rest APIs
      • How to Make an API
      • How to deploying your API
    • Jupiter Notebook Tips & Tricks
      • Jupyter
    • Image Datasets Notes
    • DS Cheatsheets
      • Websites & Blogs
      • Q&A
      • Strata
      • Data Visualisation
      • Matplotlib etc
      • Keras
      • Spark
      • Probability
      • Machine Learning
        • Fast Computation of AUC-ROC score
    • Data Visualisation
    • fast.ai
      • deep learning
      • How to work with Jupyter Notebook on a remote machine (Linux)
      • Up and Running With Fast.ai and Docker
      • AWS
    • Data Scientist
    • ML for Beginners (Video)
    • ML Mastery
      • Machine Learning Algorithms
      • Deep Learning With Python
    • Linear algebra cheat sheet for deep learning
    • DL_ML_Resources
    • Awesome Machine Learning
    • web scraping
    • SQL Style Guide
    • SQL - Tips & Tricks
  • 💡Ideas & Thoughts
    • Outdoors
    • Blog
      • markdown
      • How to survive your first day as an On-set VFX Supervisor
    • Book Recommendations by Demi Lee
  • career
    • Skills
    • learn.co
      • SQL
      • Distribution
      • Hypothesis Testing Glossary
      • Hypothesis Tests
      • Hypothesis & AB Testing
      • Combinatorics Continued and Maximum Likelihood Estimation
      • Bayesian Classification
      • Resampling and Monte Carlo Simulation
      • Extensions To Linear Models
      • Time Series
      • Distance Metrics
      • Graph Theory
      • Logistic Regression
      • MLE (Maximum Likelihood Estimation)
      • Gradient Descent
      • Decision Trees
      • Ensemble Methods
      • Spark
      • Machine Learning
      • Deep Learning
        • Backpropagation - math notation
        • PRACTICE DATASETS
        • Big Data
      • Deep Learning Resources
      • DL Datasets
      • DL Tutorials
      • Keras
      • Word2Vec
        • Word2Vec Tutorial Part 1 - The Skip-Gram Model
        • Word2Vec Tutorial Part 2 - Negative Sampling
        • An Intuitive Explanation of Convolutional Neural Networks
      • Mod 4 Project
        • Presentation
      • Mod 5 Project
      • Capstone Project Notes
        • Streaming large training and test files into Tensorflow's DNNClassifier
    • Carrier Prep
      • The Job Search
        • Building a Strong Job Search Foundation
        • Key Traits of Successful Job Seekers
        • Your Job Search Mindset
        • Confidence
        • Job Search Action Plan
        • CSC Weekly Activity
        • Managing Your Job Search
      • Your Online Presence
        • GitHub
      • Building Your Resume
        • Writing Your Resume Summary
        • Technical Experience
      • Effective Networking
        • 30 Second Elevator Pitch
        • Leveraging Your Network
        • Building an Online Network
        • Linkedin For Research And Networking
        • Building An In-Person Network
        • Opening The Line Of Communication
      • Applying to Jobs
        • Applying To Jobs Online
        • Cover Letters
      • Interviewing
        • Networking Coffees vs Formal Interviews
        • The Coffee Meeting/ Informational Interview
        • Communicating With Recruiters And HR Professional
        • Research Before an Interview
        • Preparing Questions for Interviews
        • Phone And Video/Virtual Interviews
        • Cultural/HR Interview Questions
        • The Salary Question
        • Talking About Apps/Projects You Built
        • Sending Thank You's After an Interview
      • Technical Interviewing
        • Technical Interviewing Formats
        • Code Challenge Best Practices
        • Technical Interviewing Resources
      • Communication
        • Following Up
        • When You Haven't Heard From an Employer
      • Job Offers
        • Approaching Salary Negotiations
      • Staying Current in the Tech Industry
      • Module 6 Post Work
      • Interview Prep
  • projects
    • Text Classification
    • TERRA-REF
    • saildrone
  • Computer Graphics
  • AI/ML
  • 3deeplearning
    • Fast and Deep Deformation Approximations
    • Compress and Denoise MoCap with Autoencoders
    • ‘Fast and Deep Deformation Approximations’ Implementation
    • Running a NeuralNet live in Maya in a Python DG Node
    • Implement a Substance like Normal Map Generator with a Convolutional Network
    • Deploying Neural Nets to the Maya C++ API
  • Tools/Plugins
  • AR/VR
  • Game Engine
  • Rigging
    • Deformer Ideas
    • Research
    • brave rabbit
    • Useful Rigging Links
  • Maya
    • Optimizing Node Graph for Parallel Evaluation
  • Houdini
    • Stuff
    • Popular Built-in VEX Attributes (Global Variables)
Powered by GitBook
On this page
  1. data science
  2. fast.ai

Up and Running With Fast.ai and Docker

PreviousHow to work with Jupyter Notebook on a remote machine (Linux)NextAWS

Last updated 6 years ago

Last Monday marked the start of the latest series of Fast.ai courses: Cutting Edge Deep Learning For Coders. If you have an interest in data science and haven’t heard of Fast.ai, you should check them out. Fast.ai is a community started by Jeremy Howard and Rachael Thomas in 2016. It now includes an impressive set of courses and a machine learning library by the same name. What sets them apart is their practical no nonsense approach to solving data science problems by example. In this post, I’m sharing two which provide a data science environment based on the Fast.ai library, as well some tips for getting up and running with docker quickly.

Why Docker?

Docker provides a software layer that sits above the operating system to support containerization. Virtual machines have been around for years but docker is more lightweight. With the development of Nvidia-Docker, GPU support is baked into the docker environment.

Three reasons you might want to use docker for data science are:

Plug and Play: Once you’ve installed the Nvidia-Docker server on your host machine, you run a docker image to create a container. There are thousands of docker images pre-build by companies like Nvidia with software like CUDA already installed. Once you have the image you need, you’re ready to go.

Easy Configuration: Your docker image is based on a docker file which is a script for building the image. Adding or removing software is as easy as modifying the docker file and rebuilding the image. From the docker file you can also import an existing image and add to it using the FROM command (see the Docker Filessection below for details)

Containerization: Because the docker container separates your operational software environment from the host operating system, you get the benefits of containerization. There are lots of benefits to containerization, but the one I really like is the capability to manage potential software conflicts or dependency issues at the container level without permanently changing the operating system. If things get ugly you can erase the docker image as if it never existed.

Caveats

Firstly, Docker containers need root privileges to run, which may pose security issues in some corporate settings.

Installation on Host

Docker Files

The repo contains two docker files

  1. fastai.latest.cuda8

  2. fastai.latest.cuda9

  1. The cuda version installed on the OS, being 8 and 9 respectively.

    FROM nvidia/cuda:8.0-cudnn6-devel-ubuntu16.04
    FROM nvidia/cuda:9.0-cudnn7-devel-ubuntu16.04
  2. The python package to interface with CUDA being cuda80 and cuda90 respectively. The Fastai python environment is created from the environment.yml file included in the Fastai github repository. To support cuda 8, we replace the cuda90 python package provided in environment.yml with the cuda80 package:

#           FASTAI

# clone fastai repo
RUN git clone https://github.com/fastai/fastai.git /usr/local/fastai

# replace cuda90 package with cuda80 for host machines supporting cuda8
RUN sed -i -e 's/cuda90/cuda80/' /usr/local/fastai/environment.yml

When run, the docker container automatically starts the Jupyter Notebook server with a default password: fastai.

If you need to change the password, refer to the section of the docker file titled Start Up :

  1. run the code below in a Jupyter Notebook to generate a new password keyfrom notebook.auth import passwd; passwd()

  2. use the key generated above to update the NotebookApp.password attribute in the docker fileNotebookApp.password='sha1:a60ff295d0b9:506732d050d4f50bfac9b6d6f37ea6b86348f4ed'". rebuild the docker image. (refer to the Docker Quickstart Commands section below)

#           START UP

# start jupyter server specifying password: fastai
# in jupyter notebook run the following to generate custom password key and update --NotebookApp.password=
# from notebook.auth import passwd; passwd()

CMD /bin/bash -c "source activate fastai && jupyter notebook --allow-root --no-browser --NotebookApp.password='sha1:a60ff295d0b9:506732d050d4f50bfac9b6d6f37ea6b86348f4ed'"

Docker Quickstart Commands

Example

# Arguments
# ---------
# -f path to local docker file 
# -t tag(name) of new image
# . passes the current directory as the build context

cd /home/nick/docker/ml/ml-gpu/; 
docker build -f /home/nick/docker/ml/ml-gpu/fastai.latest.cuda9 -t ireland/fastai.cuda9:latest . 

Example

# Here we run a container without passing in additional commands. 
# The fast.ai containers start a Jupyter Notebook server that runs in the background

# Arguments
# ---------
# --rm remove existing container if it exists
# -d detach from the terminal
# --name assign a name to the new container
# -p map port 8888 on the host to 8888 in the container
# -v map file path /home/nick on the host to /home in the container
# ireland/fastai.cuda9:latest the docker image to start

nvidia-docker run --rm -d --name fastai -p 8888:8888 -v /home/nick:/home -v /home/nick/data:/data ireland/fastai.cuda9:latest

Example

# Attach current command window to a running docker container. 
# This allows us to control the running container from the command window
# Arguments
# -i interactive mode
# -t allocate a terminal to the container
# fastai (container name)
# bash (command to run)

docker exec -it fastai bash

Customize Docker Image

In the course of experimentation you will probably discover the need for additional python package or software tools. Here are two simple examples of how you can modify the docker file to include new python packages.

Using Conda

# install packages: numpy-index 
# add the following before the cleanup section of the dockerfile
RUN /bin/bash -c "source activate fastai && conda install -y numpy-indexed"

Install From Source

# ML-From-Scratch
RUN git clone https://github.com/eriklindernoren/ML-From-Scratch /usr/local/ML-From-Scratch
RUN cd /usr/local/ML-From-Scratch && /bin/bash -c "source activate fastai && python setup.py install"

Summary

To get started there are 5 steps you need follow:

  1. Install Nvidia-Docker on your host machine.

  2. Download the docker file you need

  3. Build the docker file

    cd <path to dockerfile>; docker build -f <path to dockerfile> -t <img name> . 
  4. Run the docker image

    nvidia-docker run --rm -d --name fastai -p 8888:8888 -v <host workspace path>:/home -v <host data path>:/data <img name>
  5. In your web browser, navigate to localhost:8888 to access the Jupyter environment

Secondly, Docker doesn’t run natively on Windows or OSX. To use docker with these platforms you need to introduce an additional layer of virtualization, such as or . According to this performance is improving, but still lags bare metal Linux installations. However nvidia-docker is still not supported on Windows and Mac. Given the importance of a GPU for deep learning, Docker probably doesn’t make sense if you want to use Windows or Mac. The docker files have been developed for systems with a GPU supported by nvidia-docker.

The homepage for provides a useful starting point for installing the software on your host machine. There are also many tutorials online providing guides for the various flavors of linux and other operating systems. Nvidia-Docker is an additional software package that supplements the core docker installation. Its job is to interface with the Nvidia drivers on the host which control the GPU hardware. The Nvidia driver version on the host machine will determine the version of CUDA you can run in the container. Once you know the driver version you have on the host, you can check the compatible CUDA version .

You can access the repo . The docker files support nvidia-docker for versions 8 and 9 of cuda respectively. Both images inherit from an ubuntu16.04 image with CUDA. The differences between the two files are:

You’ll find a full command line reference at . Don’t be overwhelmed by the number of commands. Chances are you’ll only need to remember a handful of them.

Builds an image from a docker file.

Runs a container (created from an image if it doesn’t yet exist).

Run a command in a running docker container.

List docker images

List running docker containers

Show docker disk usage

Purges any stopped containers, the build cache and dangling images. A dangling image occurs when you rebuild an image without assigning a new name. The old version is kept and continues to take up disk space.

Docker For Windows
Docker for Mac
article
Nvidia-Docker
here
here
docker.com
docker build:
docker run:
docker exec
docker images:
docker ps:
docker system df:
docker system prune:
Fast.ai docker files