Transfer Learning in ConvNets – Part 2

42-29421947We discussed the possibility of transferring the knowledge learned by a ConvNet to another. If you new to the idea of transfer learning, please go check up the previous post here.

Alright… Let’s see a practical scenario where we need to use transfer learning. We all know that deep neural networks are data hungry. We may need a huge amount of data to build unbiased predictive models. Though the perfect scenario is that, in most of the cases, there’s not that much of data to train neural models. So, the ‘To Go” survivor for you may be transfer learning.

Here in this small demonstration what I’ve done is building a multi-class classifier that have 8 classes and only 100 odd images in the training set for each class.

The dataset I’m using here is a derivation of the “Natural Images” dataset (https://www.kaggle.com/prasunroy/natural-images/version/1#_=_ )  . I’ve randomly reduced the number of images in the original dataset for building the “Mini Natural Images”. This dataset consists of three phases for train, test and validation.  (The dataset is available in the GitHub repository) Go ahead and feel free to pull it or fork it!

Here’s an overview of the “Mini Natural Images” dataset.

datasetSo, this is going to be an image classification task. We going to take the advantage of ImageNet; and the state-of-the-art architectures pre-trained on ImageNet dataset.  Instead of random initialization, we initialize the network with a pretrained network and the convNet is finetuned with the training set.

I’ve used PyTorch deep learning framework for the experiment as it’s super easy to adopt for deep learning.  For this type of computer vision applications you can use the models available in torch vision.models (https://pytorch.org/docs/stable/torchvision/models.html )

The models available in the model zoo is pre-trained with ImageNet dataset to classify 1000 classes. With that, there’s 1000 nodes in the final layer. For adopting the model for our need, keep in mind to remove the final layer and replace it with the desired number of nodes for your task. (In this experiment, the final fc layer of the resNet18 has been replaced by 8 node fc layer)

Here’s the way to replace the final layer of resNet architecture and in VGG architecture.

#Using a model pre-trained on ImageNet and replacing it's final linear layer

#For resnet18
model_ft = models.resnet18(pretrained=True)
num_ftrs = model_ft.fc.in_features
model_ft.fc = nn.Linear(num_ftrs, 8)

#for VGG16_BN
model_ft = models.vgg16_bn(pretrained=True)
model_ft.classifier[6].out_features = 8

Rest of the training goes in the same of training and finetuning a CNN. Make sure to use a desired batch size to your GPU available in your rig. (You can use a DLVM for this task if you wish 😊)

The training and validation accuracies are plotted and the confusion matrix is generated using torchnet (https://github.com/pytorch/tnt ) which is pretty good for visualization and logging in PyTorch.

g6_r18

Confusion matrix of the classification

The classifier performs a 97% accuracy for the testing image set, which is not bad.

Now it’s your time to go ahead and get your hands dirty with this experiment. Leave a comment if you find come up with any issue. Happy coding!

Here’s the GitHub Repo for your reference!

Advertisements

Transfer Learning in ConvNets

42-29421947The rise of deep learning methods in the areas like computer vision and natural language processing lead to the need of having massive datasets to train the deep neural networks. In most of the cases, finding large enough datasets is not possible and training a deep neural network from the scratch for a particular task may time consuming. For addressing this problem, transfer learning can be used as a learning framework; where the knowledge acquired from a learned related task is transferred for the learning improvement of a new task.

In a simple form, transfer learning helps a machine learning models to learn easily by getting the help from a pre-trained machine learning model which the domain is similar to some extent (not exactly similar).

t1

The ways in which transfer might improve learning

There might be cases where transfer method actually decreases the performance, where we called them as a Negative Transfer. Normally, we (a human) engage with the task of deciding which knowledge can be transferred (mapping) in particular tasks but the active research is going on finding ways to do this mapping automatically.

That’s all about the theories! Let’s discuss how we can apply transfer learning in a computer vision task. As you all know, Convolutional Neural Networks (CNNs) is performing really well in the cases of image classification, image recognition and such tasks. Training deep CNNs need large amounts of image/video data and the massive number of parameter tuning operations takes a long time to train models. In such cases, Transfer Learning is a best fit to train new models and it is widely used in the industry as well as in the research.

There are three main approaches of using transfer learning in machine learning problems. To make it easier to understand I’ll get my examples from the context of training deep neural network models for computer vision (image classification, labeling etc.) related tasks.

ConvNet as fixed feature extractor –

In this case, you use a ConvNet that has been pre-trained with a large image repository like ImageNet and remove its last fully connected layer. The rest is used as a fixed feature extractor for the dataset you are having. Then a linear classifier (softmax or a linear SVM) should be trained for the new dataset.

t2

VGG16 as a feature extractor

Fine-tuning the ConvNet –

Here we are not just stopping by using the ConvNet as a feature extractor. We finetune the weights of the ConvNet with the data that we are having. Sometimes not the whole deepNet, the set of last layers are tuned as the first layers represent most generalized features.

Using pretrained models –

In here we used pre-trained models available in most deep learning frameworks and adjust them according to our need. In the next post, will discuss how to perform this using PyTorch.

One of the most important decisions to get in transfer learning is whether to fine tune the network or to leave it as it is. The size of the dataset and the similarity of the prevailing dataset to the model’s trained training set are the deciding factors for it.  Here’s a summary that would help you to take the decision.

Picture1Let’s discuss how to perform transfer learning with an example in the next post. 😊

Deep Learning Vs. Traditional Computer Vision

1_K68boX7fmtsYmyG2LlcmhQ

Traditional way of feature extraction

Computer vision can be succinctly described as finding and telling features from images to help discriminate objects and/or classes of objects.

Computer vision has become one of the vital research areas and the commercial applications bounded with the use of computer vision methodologies is becoming a huge portion in industry. The accuracy and the speed of processing and identifying images captured from cameras are has developed through decades. Being the well-known boy in town, deep learning is playing a major role as a computer vision tool.

Is deep learning the only tool to perform computer vision?

A big no! Deep learning came to the scene of computer vision couple of years back. Back then, computer vision was mainly based with image processing algorithms and methods. The main process of computer vision was extracting the features of the image. Detecting the color, edges, corners and objects were the first step to do when performing a computer vision task. These features are human engineered and accuracy and the reliability of the models directly depend on the extracted features and on the methods used for feature extraction. In the traditional vision scope, the algorithms like SIFT (Scale-Invariant Feature Transform), SURF (Speeded-Up Robust Features), BRIEF (Binary Robust Independent Elementary Features) plays the major role of extracting the features from the raw image.

The difficulty with this approach of feature extraction in image classification is that you have to choose which features to look for in each given image. When the number of classes of the classification goes high or the image clarity goes down it’s really hard to cope up with traditional computer vision algorithms.

The Deep Learning approach –

Deep learning, which is a subset of machine learning has shown a significant performance and accuracy gain in the field of computer vision. Arguably one of the most influential papers in applying deep learning to computer vision, in 2012, a neural network containing over 60 million parameters significantly beat previous state-of-the-art approaches to image recognition in a popular ImageNet computer vision competition: ISVRC-2012

Screen-Shot-2018-03-16-at-3.06.48-PMThe boom started with the convolutional neural networks and the modified architectures of ConvNets. By now it is said that some convNet architectures are so close to 100% accuracy of image classification challenges, sometimes beating the human eye!

The main difference in deep learning approach of computer vision is the concept of end-to-end learning. There’s no longer need of defining the features and do feature engineering. The neural do that for you. It can simply put in this way.

If you want to teach a [deep] neural network to recognize a cat, for instance, you don’t tell it to look for whiskers, ears, fur, and eyes. You simply show it thousands and thousands of photos of cats, and eventually it works things out. If it keeps misclassifying foxes as cats, you don’t rewrite the code. You just keep coaching it.

Wired | 2016

Though deep neural networks has its major drawbacks like, need of having huge amount of training data and need of large computation power, the field of computer vision has already conquered by this amazing tool already!