Lesson 1 Notes

From Deep Learning Course Wiki
Jump to: navigation, search

Introduction

Improvements in deep learning that we hope to make possible through this community

This is the first of a seven-part series of lessons in deep learning. The purpose of this course is to make deep learning accessible to those individuals who may or may not possess a strong background in machine learning or mathematics. We strongly believe that deep learning will be transformative in any application; as such, this course is designed for individuals who possess a background in computer programming and would like to apply these techniques to their domain of expertise.

By the end of this lesson, we will understand how to use a useful deep learning technique in only seven lines of code.

All profits Jeremy is making from this course are being donated to the Fred Hollows Foundation, an organisation dedicated to ending avoidable blindness for as little as $25.00 per eye. As we continue to teach computers how to see, we strongly encourage members of this community to help cure avoidable vision loss by donating to the Fred Hollows Foundation.

Why Now?

Deep learning is becoming increasingly relevant because of three key reasons:

State of dl.png
Infinitely Flexible Function
  • An Infinitely Flexible Function: The function driving deep learning processes is the neural network, which are essentially universal approximation machines. Specifically, the universal approximation theorem tells us that this function is capable of handling any problem we apply it to.
Parameter fitting
  • Parameter Fitting: Through gradient descent/backward propagation, we're able to fit to any parameters given training data to do so. This allows us to approximate the theoretical function that allows us to do anything, and apply it.
  • Speed and Scalability: Deep learning relies upon matrix operations to achieve it's results, which are computationally expensive. Graphics Processing Units (GPU's) are processors that are optimized to perform these kinds of computations for computer graphics and image processing. Fortunately, the growth of the gaming industry has resulted in cheap and powerful GPU's.

As a result of these reasons, it is now possible for anyone to apply deep learning techniques to real problems in a way that is both affordable and fast.

Growth of Deep Learning

Dl at google.png

A recent talk by Jeff Dean has highlighted the massive increase in deep learning at Google. As we can see, the number of deep learning projects at Google has exploded within the past few years and can be found in almost every application. We believe that similar growth is possible at any organization, not just tech giants.

We can already see how amateurs are easily applying deep learning to a variety of impactful applications:

  • Classification of Skin Lesions: Using a pre-trained CNN very similar to what we'll be using in this lesson, this group created a model that could accurately classify skin lesions by 60%, a four-fold increase over the previous benchmark of 15.6%. That's a massive improvement!
  • Classification of Plant Diseases: This group of industrial engineers were able to use deep learning to successfully distinguish between healthy leaves and those with 13 other diseases.
  • Radio Modulation Classification: A group of electrical and computer engineers used deep learning to recognize radio modulations. Their results indicated that it is possible to double the effective coverage area of a sensing system. Again, we see deep learning making a massive impact!
  • Heart Condition Diagnosis: Two hedge fund analysts were able to build a model that could diagnose heart conditions as accurately as doctors. This is astounding!
  • Clothes Classification: This group successfully built a model that could recognize articles of clothing, as well as their style.

This is just the tip of the ice berg. Applications like lesion classification and heart diagnosis have massive potential to radically transform healthcare throughout the developing world. Again, it can't be stressed enough that anyone with computer programming background can learn how to accomplish similar things without having to be an expert in the field. Our goal is to teach you exactly how to do it, in the hopes that we can extend the impact of deep learning to all areas.

Preparation

GPU Access

In order to follow along in this course, you will need access to a GPU that is suitable for deep learning. Fortunately, Amazon Web Services now have instances available with NVIDIA GPU's. See AWS install for instructions on how to get set up with the necessary instance (alternative GPU options can be found at Installation). In addition, we will be interacting with our operating systems and Amazon's servers using Bash, and you can find tutorials on that page. Be sure to note what ip address you are connected to, as we will need that later.

Necessary Files

Read our github wiki page to see how to obtain the necessary files from our github repository.

Next, we need to download the dataset we will be using to train and test our model. In this lesson, we're going to be looking at the Dogs vs. Cats Kaggle competition, which asked users to classify images of cats and dogs. At that time the state of the art scored an 80% accuracy; we will see shortly how to build a simple model that scores 97% accuracy.

Before downloading the data, create a subdirectory in nbs called data. Then pull dogscats.zip into that directory and unzip.

Data Structure

It's important to note the structure of this data, because the implementation of the deep learning model we'll be using relies upon it.

The standard practice in building machine learning models is to split our data into the following subsets:

  • Training set: This is the data that our algorithm is going to use to fit parameters in order to make predictions
  • Validation set: The data we use to fine tuning our parameters
  • Test set: The data we use to test our final model against. Since this data has not been seen by the model before, it is meant to simulate how it will perform against new data.

A quick exploration of the DogsCats directory shows that the test, train, and validation sets are in separate subdirectories. Under each of these directories are further subdirectories, cats and dogs, that contain the data for each category respectively. Further, note that each dog and cat image are labeled as dog or cat. We can see also that our test data is unlabeled, which is as we'd expect.

It's important to take note of this structure because the deep learning implementation we're going to use expects our data directory to follow these conventions. Specifically, it will know how many categories to train for based upon the number of subdirectories under your training and validation directories. Be sure to follow these conventions when attempting to apply this model to a different dataset.

You'll also notice another directory named sample. Training and validating an entire dataset can take some time. Therefore, it's good practice when tweaking your implementation to run your algorithm on a small sample of your training and validation data. By doing so, you can get results back instantaneously until you're satisfied and are ready to build your model on the entire dataset.

Using Convolutional Neural Networks

In this first lesson, we're going to be looking at Convolutional Neural Networks (CNN's) that allow computers to "see".

Introduction to this week's task: 'Dogs vs. Cats'

We're going to try to create a model to enter the Dogs vs Cats competition at Kaggle. There are 25,000 labelled dog and cat photos available for training, and 12,500 in the test set that we have to try to label for this competition. According to the Kaggle web-site, when this competition was launched (end of 2013): "State of the art: The current literature suggests machine classifiers can score above 80% accuracy on this task". So if we can beat 80%, then we will be at the cutting edge as at 2013!

Basic Setup

Before we begin, we need to get our notebook up and running.

Use $ jupyter notebook to open a connection to the notebook GUI at port 8888. Recall the ip address for your instance we recorded earlier. In your browser, go to instanceIP:8888 to access the GUI, and login with password dl_course. Click on lesson1.ipynb.

We're now in our notebook (please see Jupyter notebook if you are unfamiliar with this platform). Before we get to the model, we need to configure and import the necessary packages. The first step is to run:

%matplotlib inline

All this does is make sure our plots show in our notebook.

Next, we're going to define our path, so we can change between our sample and entire dataset on the fly.

#path = "data/dogscats/"
path = "data/dogscats/sample/"

Make sure to set our path to our sample.

We'll need to import several libraries:

from __future__ import division,print_function
import os, json
from glob import glob
import numpy as np
np.set_printoptions(precision=4, linewidth=100)
from matplotlib import pyplot as plt
import utils; reload(utils)
from utils import plots

Please see Python libraries if you are unfamiliar with some of these libraries, notably numpy or matplotlib.

Recall that utils.py was a library we had downloaded that holds some convenient functions we'll be using. reload(utils) will allow us to update our notebook with any changes we might make to the library.

We're now ready to get started with our model!

Using a Pretrained VGG model with our Vgg16 class

Our first step is simply to use a model that has been fully created for us, which can recognise a wide variety (1,000 categories) of images. We will use 'VGG', which won the 2014 Imagenet competition, and is a very simple model to create and understand. The VGG Imagenet team created both a larger, slower, slightly more accurate model (VGG 19) and a smaller, faster model (VGG 16). We will be using VGG 16 since the much slower performance of VGG19 is generally not worth the very minor improvement in accuracy. We have created a python class, Vgg16, which makes using the VGG 16 model very straightforward.

Sample of the image set Vgg16 was trained on. Notice how each photo is typically of one object. When using pre-trained models, it's important to understand the dataset the model was trained on.

It's important to understand the data that Vgg16 is built on. Take a look at the images here; you can see that most of these images are of single objects. What this means is that Vgg16 is best applied in circumstances where the images are primarily of one object, such as our cats and dogs images. It's essential when using a pre-trained model to explore the data it was trained on so you can understand it's limits and bias.

The punchline: state of the art custom model in 7 lines of code

Now that we understand what Vgg16 is and what it's built on, we can now see in seven lines of code how to build a state of the art model that can get > 97% accuracy on the Cats vs. Dogs datset:

# As large as you can, but no larger than 64 is recommended. 
# If you have an older or cheaper GPU, you'll run out of memory, so will have to decrease this.
batch_size=64
# Import our class, and instantiate
from vgg16 import Vgg16# Import our class, and instantiate
from vgg16 import Vgg16
vgg = Vgg16()
# Grab a few images at a time for training and validation.
# NB: They must be in subdirectories named based on their category
batches = vgg.get_batches(path+'train', batch_size=batch_size)
val_batches = vgg.get_batches(path+'valid', batch_size=batch_size*2)
vgg.finetune(batches)
vgg.fit(batches, val_batches, nb_epoch=1)
Vgg 7 lines.png

And just like that we now have a model for classifying images of cats and dogs, which can easily be reproduced for any image classification problem provided the directory is structured correctly. Fantastic!

Let's take a look at how this works, step by step...

Use Vgg16 for basic image recognition

Let's start off by using the Vgg16 class to recognise the main imagenet category for each image. We won't be able to enter the Cats vs Dogs competition with an Imagenet model alone, since 'cat' and 'dog' are not categories in Imagenet - instead each individual breed is a separate category. However, we can use it to see how well it can recognise the images, which is a good first step. First, create a Vgg16 object:

vgg = Vgg16()

Vgg16 is built on top of Keras (which we will be learning much more about shortly!), a flexible, easy to use deep learning library that sits on top of Theano or Tensorflow. Keras reads groups of images and labels in batches, using a fixed directory structure, where images from each category for training must be placed in a separate folder. Let's grab batches of data from our training folder:

batches = vgg.get_batches(path+'train', batch_size=4)
Train class.png

(BTW, when Keras refers to 'classes', it doesn't mean python classes - but rather it refers to the categories of the labels, such as 'pug', or 'tabby'.) Batches is just a regular python iterator. Each iteration returns both the images themselves, as well as the labels.

To understand what Vgg16 does, let's look at what a batch looks like.

imgs,labels = next(batches)
plots(imgs, titles=labels)
Batch input.png

As you can see, the labels for each image are an array, containing a 1 in the first position if it's a cat, and in the second position if it's a dog. This approach to encoding categorical variables, where an array containing just a single 1 in the position corresponding to the category, is very common in deep learning. It is called one hot encoding. The arrays contain two elements, because we have two categories (cat, and dog). If we had three categories (e.g. cats, dogs, and kangaroos), then the arrays would each contain two 0's, and one 1.

We can now pass the images to Vgg16's predict() function to get back probabilities, category indexes, and category names for each image's VGG prediction.

vgg.predict(imgs, True)
Vgg16 predict.png

The category indexes are based on the ordering of categories used in the VGG model - e.g here are the first four:

vgg.classes[:4]
Vgg16 classes.png

(Note that, other than creating the Vgg16 object, none of these steps are necessary to build a model; they are just showing how to use the class to view imagenet predictions.)

Use our Vgg16 class to finetune a Dogs vs Cats model

To change our model so that it outputs "cat" vs "dog", instead of one of 1,000 very specific categories, we need to use a process called "finetuning". Finetuning looks from the outside to be identical to normal machine learning training - we provide a training set with data and labels to learn from, and a validation set to test against. The model learns a set of parameters based on the data provided. However, the difference is that we start with a model that is already trained to solve a similar problem. The idea is that many of the parameters should be very similar, or the same, between the existing model, and the model we wish to create. Therefore, we only select a subset of parameters to train, and leave the rest untouched. This happens automatically when we call fit() after calling finetune(). We create our batches just like before, and making the validation set available as well. A 'batch' (or mini-batch as it is commonly known) is simply a subset of the training data - we use a subset at a time when training or predicting, in order to speed up training, and to avoid running out of memory.

First, lets feed Vgg16 our training and validation batches.

batch_size=64
batches = vgg.get_batches(path+'train', batch_size=batch_size)
val_batches = vgg.get_batches(path+'valid', batch_size=batch_size)
Batch output train val.png

Calling finetune() modifies the model such that it will be trained based on the data in the batches provided - in this case, to predict either 'dog' or 'cat'.

vgg.finetune(batches)

This modifies our model such that it will be trained based on the data in the batches provided. In our case, we're now telling our model to categorize into cats and dogs

vgg.fit(batches, val_batches, nb_epoch=1)
Fitted model output.png

Finally, we fit() the parameters of the model using the training data, reporting the accuracy on the validation set after every epoch. (An epoch is one full pass through the training data.)

We see that our accuracy is not very high, and that is to be expected given that we're training on a sample. But now that we understand what is going on, we can change our path back to the full training/validation set and run our model again from the start. Doing so should give you very high training and validation accuracy, around 97%.

That shows all of the steps involved in using the Vgg16 class to create an image recognition model using whatever labels you are interested in. For instance, this process could classify paintings by style, or leaves by type of disease, or satellite photos by type of crop, and so forth. Next up, we'll dig one level deeper to see what's going on in the Vgg16 class. Ready for Lesson 2?