Fine tuning is a process to take a network model that has already been trained for a given task, and make it perform a second similar task.
Assuming the original task is similar to the new task, using a network that has already been designed & trained allows us to take advantage of the feature extraction that happens in the front layers of the network without developing that feature extraction network from scratch. Fine tuning:
- Replaces the output layer, originally trained to recognize (in the case of imagenet models) 1,000 classes, with a layer that recognizes the number of classes you require
- The new output layer that is attached to the model is then trained to take the lower level features from the front of the network and map them to the desired output classes, using SGD
- Once this has been done, other late layers in the model can be set as 'trainable=True' so that in further SGD epochs their weights can be fine-tuned for the new task too.
With respect to the cats vs dogs example:
- The original task would be classifying the images into the 1000s of Imagenet categories.
- The new task would be to classify the images into just 2 categories i.e., cats or dogs.
- From the definition of finetune, the last layer is removed/popped.
- Trainable is set to false for all other lower layers as they have already been trained (as part of the original task)
- In order for the trainable false to take effect, the model needs to be compiled again, as per the last line in the vgg16.finetune() function.