10 Tips for Designing & Developing Computer Vision Projects

Computer vision based applications have become one of the most popular research areas as well as have gained lot of interest in different industrial domains. Popularity and the advancements of deep learning have given a boost for the hype of computer vision.

Being a researcher focused on computer vision based applications for nearly 3 years, Here are some tips I’d give for a developer who’s stepping into a computer vision related experiment/ deployment.

Before going further into the discussion, you may need to get an idea on the difference between traditional computer vision approaches and deep learning based approaches. Here’s a quick overview on that.

01. Do we really have to use deep learning based computer vision approaches to solve this?

This is the very first thing to concern! When you see a problem from the scratch, you may think applying deep learning for this is the survivor. It’s not true in some cases. You may be able to solve the problem using traditional line detection filters etc. easily without wasting the time and energy in training a deep learning model to solve the task. Observe the problem thoroughly and get the decision to move forward or not.

02. Analyze the input data and the desired output

To be obvious, deep learning based computer vision models get images or videos as its input modalities. Before starting the project implementations, we should consider following factors of the input data we have.

Size of the data –

Since DL models need a huge amount of data (in most of the cases) for training without getting the models overfitted we need to make sure we have a good amount of data in hand for training. In this case we can’t specify exact numbers. I’d say more the better!

Quality of the data –

Some image inputs or the video streams we get are blurred and not covering the most important features we need to build the models. Getting images/ videos in higher resolution is always better. When considering the quality of the data it’s better to take a look on the factors like class imbalance if it’s a classification problem.

Similarity of training data and data inputs in the inference time –

I’ve seen cases where data model is getting in the inference time is very different than the data used in the training (For an example the model is trained using cat images from cartoons and it’s getting real life cat images in the inference time.) If it’s not a model which is specifically designed for domain adaptation, you should NEVER do this mistake.

03. Building from the scratch? Is it necessary?

As I said previously, computer vision is one of the most widely researched areas in deep learning. So that, you are having the privilege of using pre-built models as well as online services to perform your computer vision workloads.

Services such as Azure cognitive services, Google vision APIs etc. provides pre-built web APIs which you can directly use for many vision related tasks. Starting from an OCR task of reading a text in a scanned document, there are APIs which can identify human faces and their emotions even. No need to build from the scratch. You can just use the service as a web service in your application.

Even going a step forward from the pre-built services Microsoft Azure cognitive services offer a custom vision service where you can train your own image classification models with your own data. This may come handy in most of the practical applications where you don’t need to spend time on building the model or configuring the training environment.

04. Building from scratch? Is it REALLY necessary?

Yp! Again, a decision to take. If your problem cannot be addressed from the pre-built computer vision services available online, the option you have to go forward is building a deep learning model and training it using your own data. When it comes to model development one of the very big mistakes we do is neglecting the prevailing models built by researchers for various purposes.

I’m pretty sure most of the computer vision tasks that you have is falling under famous computer vision areas such as image classification, action recognition in videos, human pose detection, human/ object tracking etc. There are many pre-built methods which has been achieved state-of-the-art accuracy in solving these problems and benchmarked with most of the publicly available big datasets. For an example, ResNet models are specifically designed for image classification and shown the best accuracy on ImageNet dataset. You can easily use these models (Most of these models are available in model zoos of popular deep learning frameworks) and adapt their last layers for your needs and get higher accuracies rather than building your own model from the scratch.

Papers with code is a great place to search for prevailing models on various computer vision tasks.

I recently came across this openMMLab repositories which comes pretty handy in such tasks. (Mostly for video analysis stuff)

05. Use the correct method

When building the models, make sure you follow the correct path which matches with your data input. For an example if you only have few training images to train your classification model, you may need to look on areas like few-shot learning to train your model. Tricks such as adding batch normalization, using correct loss functions, adding more input modalities, using learning rate schedulers, transfer learning will surely increase your model accuracy.

06. Data augmentation is a suvivor!

More data the better! Always take a look on sensible data augmentation methods to make sure your model is not overfitted for training data. Always visualize your data inputs before using that for model training to make sure your data augmentations are making sense.  

07. Model training should not be a nightmare

This is the most time-consuming part in developing computer vision models. We all know training deep learning models needs a lot of computation power. Make sure you have enough computation power to train your models. It’ll be a nightmare to train an image classifier which is having 100,000 images just using your CPU! Make sure you have a good enough GPUs for performing the computations and configured them correctly for training models.  

08. Model inference time should not be years!

Model inferencing the least concerned portion in model development. Though it is the most vital part since this is where the outcome is shown for the outsider. Sometimes, your trained model may take a lot of time for inferencing which may make the model useless in a real-world application. Think of a human detection system you implemented taking 1-2 minutes to identify a human who’s accessing a secured location…. There’s no use of a such system since that doesn’t meet the need of real-time surveillance. Always make sure to develop the simplest model that gives the best accuracy. Sometimes you may have to compromise few digits from the accuracy numbers to increase the model efficiency. That’s totally fine in a real-world application. Before pushing the model into production, take a look on converting the models to ONNX or model pruning. It’ll help you to deploy efficient models.

09. Take a look on your deployment target

This directly connects with the facts we discussed in the model inference time. We don’t have the luxury of having high end machines powered with GPUs in all deployment locations. Or having high powered cloud services. Sometimes out deployment target may be a IoT device. So that make sure you design a light weight model which even provides a good performance by consuming less resources.     

10. Privacy concerns

Last but not least, we may have to look on privacy concerns. Since we are dealing with image and video data which may contains lot of personal informaiton of the people, we need to make sure we are followiong the privacy guidelines and making sure the data we use for model training is having enough security clearance to do such tasks.

Bit lengthy… but hope you got some clues before getting into your next computer vision project. Happy coding 😊

Open Neural Network Exchange (ONNX)

In the current AI landscape, there are plenty of programming languages, frameworks, runtime environments and hardware devices used by practitioners for developing and deploying their machine learning and deep learning models. This technology stack get widen when it comes for integrating these machine learning models into software development processes.

With the experience with software development, we know handling platform dependencies and getting all components work smoothly is one of the biggest headache developers face. There’s no big difference in the machine learning space.

Addressing the problem of communicating between different machine learning development frameworks, industry is now adapting to “Open Neural Network Exchange” (ONNX).

What is ONNX?

ONNX acts as the open standard for representing ML/DL models

ONNX is an open format to represent both deep learning and tradition machine learning models. It increases the interoperability of the models without depending on the runtime environment or the development tools.

In simple words, you can port your neural network in a deep learning framework like Pytorch and then inference it on a Tensorflow environment by converting it into a ONNX model!

ONNX is widely supported by most of the frameworks, tools and hardware (Since it’s evolving rapidly, am pretty sure many frameworks will come under ONNX in the near future.)

Since ONNX is backed by the big players in AI space such as Facebook, Microsoft, AWS and Google you are use your familiar frameworks easily with ONNX.

Why ONNX?

Let’s get a scenario where you have built a deep learning based classification model for classifying grocery items using PyTorch as your deep learning framework. In a later stage of the developments you need to use the built model on a iOS mobile application where machine learning based operations are based on CoreML. You can export the PyTorch model into a ONNX model and then use on CoreML runtime for inference.

ONNX has proven it’s success in the scenarios where we have to deploy deep learning based models on IoT devices with less computation power and has stated a noticeable performance increase in inference times.

With ONNX, you don’t need to package the various platform dependencies in the deploying target. You just need the ONNX runtime.

You can find out the ONNX supported list of tools and frameworks through this link.

In the coming posts, am going to discuss my experiences with setting up ONNX runtime and using it with my favourite deep learning framework, PyTorch!

Happy coding 🙂

PyTorch Custom Dataset Tips and Tricks

Loading massive and complex datasets for training deep learning models has become a normal practice in most of the deep learning experiments. Handling large datasets which contain multimedia such as images, video frames and sound clips etc. can’t be perform just with simple file open commands which drastically reduce the model training efficiency.

Featuring a more pythonic API, PyTorch deep learning framework offers a GPU friendly efficient data generation scheme to load any data type to train deep learning models in a more optimal manner.

Based on the Dataset class (torch.utils.data.Dataset) on PyTorch you can load pretty much every data format in all shapes and sizes by overriding two subclass functions.

 __len__  – returns the size of the dataset

__getitem__  – returns a sample from the dataset given an index.

Here’s a rough skeleton of the Dataset class which you can modify for your need.

import torch
from torch.utils.data.dataset import Dataset

#If available use GPU memory to load data 
use_cuda = torch.cuda.is_available()
device = torch.device("cuda:0" if use_cuda else "cpu")


class MyCustomDataset(Dataset):
    def __init__(self, ...):
        # # All the data preperation tasks can be defined here
        # - Deciding the dataset split (train/test/ validate)
        # - Data Transformation methods 
        # - Reading annotation files (CSV/XML etc.)
        # - Prepare the data to read by an index
        
    def __getitem__(self, index):
        # # Returns data and labels
        # - Apply initiated transformations for data
        # - Push data for GPU memory
        # - better to return the data points as dictionary/ tensor  
        return (img, label)

    def __len__(self):
        return count # of how many examples(images?) you have

These are some tips and tricks I follow when writing custom dataloaders for PyTorch.

  • Datasets will expand with more and more samples and, therefore, we do not want to store too many tensors in memory at runtime in the Dataset object. Instead, we will form the tensors as we iterate through the samples list. This approach may be bit slow in processing but save us from going out of memory.
  • __init__ function should be the place where all the initial data preparations and logics happens. Do the operations where you may need to read data annotation files (CSV/XML etc.) here.
  • If you have separate portions of the dataset for train/test and validate, make sure you define that logic inside __init__ function. You can pass the desired data split as an argument for the function.
  • __init__ function is the place where you can define the data transformations. For an example, if you have image data to load and need to do resize and normalize images you can use torchvision transforms here.
#Example transform for image data
self.transform = transforms.Compose([transforms.Resize((224,224)), 
                                             transforms.ToTensor(),
                                             transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])])
  • Make sure you index your custom dataset in a relational structure when initiating. Generating an array or a list of the datapoints is a better way to do it.
  • __len__ function comes handy to see how many data points has been loaded through init. The data length is normally the number of records loaded into the final list or array you created inside __init__ .  
  • __getitem__ function should be light weight. Avoid using too complex computations inside __getitem__ function. 
  • PyTorch DataLoaders just call __getitem__() and wrap them up a batch when performing training or inferencing. So, this function is iterative. Make sure you return one datapoint at a time.
  • Always try to return the values from __getitem__ as tensors.
  • If you have multiple components to return from the DataLoader, using a Python dictionary is a handy option. You can structure it as key value pairs in the dictionary. Here’s an example dictionary item which contains four values in it.  
item = {
         'video_id' : video_id,
          'activity_id' :activity_id,
          'activity_frame': activity_frame_as_tensor,
          'activity_annotation' : activity_annotation
        }

Consuming the dataset –

You should create a CustomDataset object when you need to consume the data. This is a sample code snippet that demonstrate how to access the data points through the custom dataloader you created.

#Consuming the dataset 

#creating the dataset object
dataset = MyCustomDataset(...)

#Randomly split dataset into trainset and the validation set 
train_data, val_data = random_split(dataset, [50000, 10000])

#Create DataLoader iterators
train_loader = DataLoader(train_data, batch_size=64, shuffle=True, num_workers=2)
val_loader = DataLoader(val_data, batch_size=64, shuffle=True, num_workers=2)

#Iterating through the data loader object
for i, batch in enumerate(train_loader):
    print(i, batch)

You may notice, the dataLoader iterator can be batched, shuffled and load the data using multiprocessing just by changing the parameters in the function. Make sure you choose a batch size which fits with your memory capacity. If you loading the data to the GPU, it’s the GPU memory you should consider on.

If you using a multi-GPU setup with PyTorch dataloaders, it tries to divide the data batches evenly among the GPUs. If the batch size is less than the number of GPUs you have, it won’t utilize all GPUs.

I would say CustomDataset and DataLoader combo in PyTorch has become a life saver in most of complex data loading scenarios for me. Would love to hear from you on the experiences you have with writing Custom DataLoaders in PyTorch.

Happy Coding!  

Tensorboard with PyTorch

tb_1

Tensorboard Interface

Training and evaluating deep learning models may take a lot of time. Sometimes it’s worth to monitor how good or bad the model is training in real-time. It’ll help to understand, debug and optimize your models without waiting till the model get trained to monitor the performance.The good old method of printing out training losses / accuracy for each epoch is a good idea, but it’s bit hard to evaluate the metrics comparatively with that.

A real-time graphical interface that can use to plot/ visualize metrics while a model is training through epochs or iterations would be the best option. Tensorboard is visualization tool came out with TensorFlow and I’m pretty sure almost all TF guys are using and getting the advantage from that cool tool.

So what about PyTorchians?? Don’t panic. Official PyTorch repository recently came up with Tensorboard utility on PyTorch 1.1.0 . Still the code is experimental and for me it was not working well for me.

Then, I found this awesome opensource project, tensorboardX. Pretty similar to what PyTorch official repo is having and easy to work with. TensorboardX supports scalar, image, figure, histogram, audio, text, graph, onnx_graph, embedding, pr_curve and video summaries.

5 simple steps…

  1. Install tensorboardX
  2. Import tensorboardX for your PyTorch code
  3. Create a SummaryWriter object
  4. Define SummaryWriter
  5. Use it!

I just did a simple demo on this by adding Tensorboard logs for the famous PyTorch transfer learning tutorial. Here’s the GiHub repo. Just clone and play around it.

Note that in the experiment I’ve used two SummaryWriter objects two create two scalar graphs for training phase and the other one for validation phase.

The log files will be created in the directory you specified when creating SummaryWriter object. (You can change this directory to wherever you want)

To view the tensorboard, open a terminal inside the experiment folder. Assume that your log files are inside ‘./logs/’ . Use the following command to spin up the tensorboard server on your local machine.

$ tensorboard –logdir ./logs/

Sometimes you may use a remote server or a VM (might be a Azure DLVM) for training your deep learning models. Then how to get this tensorboard out from there??

SSH Tunneling with post forwarding is a good option you can use for this. You just have to spin up the tensorboard service on your remote machine. Then tunnel the server back to your workstation with the ssh command stated below.

$ ssh -N -L 6007:127.0.0.1:6006 <username>@<remote_ip>

127.0.0.1:6006 : Tensorboard server running on the remote server / VM

6007 : local workstation port

You can then view the tensorboard running on the remote machine through your local machine’s browser.

http://<remote_ip>:6006

That’s it! Simple and neat. No need to wait couple of days till the model get trained. Just monitor and stop early if it’s not learning well.

Enjoy Deep Learning!

Achieving Super Convergence of DNNs with 1cycle Policy

I would say, training a deep neural network model to achieve a good accuracy is an art. The training process enable the model to learn the model parameters such as the weights and the biases with the training data. In the process of training, model hyper-parameters govern the process. They control the behavior of model training and does a significant impact on model accuracy and convergence.

Learning rate, number of epochs, hidden layers, hidden units, activation functions, momentum are the hyperparameters that we can adjust to make the neural network models perform well.

Adjusting the learning rate is a vital factor for convergence because a small learning rate makes the training very slow and can occur overfitting, while if the learning rate is too large, the training will diverge. The typical way of finding the optimum learning rate is performing a grid search or a random search which can be computationally expensive and take a lot of time. Isn’t there a smart way to find out the optimal learning rate?

Here I’m going to connect some dots together on a process I followed to choose a good learning rate for my model and a way of training a DNN with different learning rate policy.

Many researchers actively work on this area and through his paper “Cyclical Learning Rates for Training Neural Networks” by Leslie N. Smith proposed Learning rate range test (LR range test) and Cyclical Learning Rates (CLR).

Not going to discuss the interesting theory behind LR range test and CLR, as fast.ai has a pretty good introduction on the method and they even have an implementation of LR range test that can use off the shelf. Strongly recommend to read this post. I  found a nice implementation on LR range test in PyTorch by David Silva and feel free to pull it from here . https://github.com/davidtvs/pytorch-lr-finder

In 2018, by the paper “A disciplined Approach to Neural Network Hyper-Parameters : Part 1 – Learning Rate, Batch Size, Momentum, and Weight Decay” Smith introduces the 1cycle policy which is only running a single cycle of training compared to several cycles in the CLR. Strongly suggest to take a look on this blog post to get an idea on 1cycle policy.

Ok… Now you read it! Is this working???

I give it a try using a simple transfer learning experiment. The dataset and the experiment I used here is from the PyTorch documentation which you can find here.  These are the steps I followed during the experiment.

Yeah! I’ve pushed the experiment to GitHub and feel free to use it. 😊

  1. Run the LR range finder to find the maximum learning rate value to use on 1cycle learning.

lr_finder_output

Output from the LR finder

According to the graph it is clear that 5*1e-3 can be the maximum learning rate value that can be used for training. So, I chose 7*1e-3; which is bit before the minimum as my maximum learning rate for training.

  1. Run the training using a defined learning rate (Note that a learning rate decay has used during training)
  2. Run the training according to the 1cycle policy. (A cyclical momentum and cyclical learning date have been used. Note that the learning rate and the momentum is changing in each mini-batch: not epoch-wise.)

1cy

  1. Compare the validation accuracy and validation loss of each method.

Can you notice that the green line, which represents the experiment trained using 1cycle policy gives a better validation accuracy and a better validation loss when converging.

These are the best validation accuracy of the two experiments.

  • Fixed LR : 0.9411
  • 1-cycle : 0.9607

Tip : Use the batch size according to the computational capacity you are having. The number of iterations in 1cycle policy depends on the batch size, number of epochs and the dataset size you are using for training.

Though this experiment is a simple one, it is proven that 1cycle policy does a job in increasing the accuracy of neural network models and helps for super convergence. Give it a try and don’t forget to share your experiences here. 😊

References – 

[1] Cyclical Learning Rates for Training Neural Networks
https://arxiv.org/abs/1506.01186

[2] A disciplined approach to neural network hyper-parameters: Part 1 — learning rate, batch size, momentum, and weight decay
https://arxiv.org/abs/1803.09820

[3] The 1cycle policy
https://sgugger.github.io/the-1cycle-policy.html

[4] PyTorch Learning Rate Finder
https://github.com/davidtvs/pytorch-lr-finder

[5] Tranfer Learning Tutorial
https://pytorch.org/tutorials/beginner/transfer_learning_tutorial.html

C3D with Batch Normalization for Video Classification

Convolutional Neural Networks (CNNs) are well known for its ability to understand the spatial and positional features. 2D convolutional networks and widely used in computer vision related tasks. There are plenty of research happened and on going with 2D CNNs and the famous ImageNet challenge has gained an accuracy even better than humans!

Research teams have introduced several network architectures for solving the problem of image classification and related computer vision tasks.  LeNet(1998), AlexNet(2012), VGGNet(2014), GoogleNet(2014), ResNet(2015) are some of the famous CNN architectures in use now.  (I’ve discussed about using pre-trained models to perform transfer learning with these architectures here. Take a look. 🙂 )

1_ZqkLRkMU2ObOQWIHLBg8sw

It was all about 2D images. Then what about videos? 3D convolutions which applies a 3D kernel to the data and the kernel moves 3-directions (x, y and z) to calculates the feature representations is helpful in video event detection related tasks.

Same as in the area of 2D CNN architectures, researchers have introduced CNN architectures that are having 3D convolutional layers. They are performing well in video classification, event detection tasks. Some of these architectures have been adopted from the prevailing 2D CNN models by introducing 3D layers for them.

jriyCTU

A 3D Convo operation

Tran et al. from Facebook AI Research introduced the C3D model to learn spatiotemporal features in videos using 3D convolutional Networks.This is the paper : “Learning Spatiotemporal Features with 3D Convolutional Networks In the original paper they have used Dropout to regularize the network.

Instead of using dropout, I tried using Batch Normalization to regularize the network. Each convolutional layer id followed by a 3D batch normalization layer. With batch normalization, you can use bit bigger learning rates to train the network and it allows each layer of the network to learn by itself a little bit more independently from other layers.

This is just the PyTorch porting for the network. I use this network for video classification tasks which each video is having 16 RGB frames with the size of 112×112 pixels. So the tensor given as the input is (batch_size, 3, 16, 112, 112) . You can select the batch size according to the computation capacity you have.

import torch.nn as nn

class C3D_BN(nn.Module):
"""
 The C3D network as described in [1]
 Batch Normalization as described in [2]

 """

def __init__(self):
super(C3D_BN, self).__init__()

self.conv1 = nn.Conv3d(3, 64, kernel_size=(3, 3, 3), padding=(1, 1, 1))
self.conv1_bn = nn.BatchNorm3d(64)
self.pool1 = nn.MaxPool3d(kernel_size=(1, 2, 2), stride=(1, 2, 2))

self.conv2 = nn.Conv3d(64, 128, kernel_size=(3, 3, 3), padding=(1, 1, 1))
self.conv2_bn = nn.BatchNorm3d(128)
self.pool2 = nn.MaxPool3d(kernel_size=(2, 2, 2), stride=(2, 2, 2))

self.conv3a = nn.Conv3d(128, 256, kernel_size=(3, 3, 3), padding=(1, 1, 1))
self.conv3a_bn = nn.BatchNorm3d(256)
self.conv3b = nn.Conv3d(256, 256, kernel_size=(3, 3, 3), padding=(1, 1, 1))
self.conv3b_bn = nn.BatchNorm3d(256)
self.pool3 = nn.MaxPool3d(kernel_size=(2, 2, 2), stride=(2, 2, 2))

self.conv4a = nn.Conv3d(256, 512, kernel_size=(3, 3, 3), padding=(1, 1, 1))
self.conv4a_bn = nn.BatchNorm3d(512)
self.conv4b = nn.Conv3d(512, 512, kernel_size=(3, 3, 3), padding=(1, 1, 1))
self.conv4b_bn = nn.BatchNorm3d(512)
self.pool4 = nn.MaxPool3d(kernel_size=(2, 2, 2), stride=(2, 2, 2))

self.conv5a = nn.Conv3d(512, 512, kernel_size=(3, 3, 3), padding=(1, 1, 1))
self.conv5a_bn = nn.BatchNorm3d(512)
self.conv5b = nn.Conv3d(512, 512, kernel_size=(3, 3, 3), padding=(1, 1, 1))
self.conv5b_bn = nn.BatchNorm3d(512)
self.pool5 = nn.MaxPool3d(kernel_size=(2, 2, 2), stride=(2, 2, 2), padding=(0, 1, 1))

self.fc6 = nn.Linear(8192, 4096)
self.fc7 = nn.Linear(4096, 4096)
self.fc8 = nn.Linear(4096, 8)
self.relu = nn.ReLU()

def forward(self, x):

h = self.relu(self.conv1_bn(self.conv1(x)))
h = self.pool1(h)

h = self.relu(self.conv2_bn(self.conv2(h)))
h = self.pool2(h)

h = self.relu(self.conv3a_bn(self.conv3a(h)))
h = self.relu(self.conv3b_bn(self.conv3b(h)))
h = self.pool3(h)

h = self.relu(self.conv4a_bn(self.conv4a(h)))
h = self.relu(self.conv4b_bn(self.conv4b(h)))
h = self.pool4(h)

h = self.relu(self.conv5a_bn(self.conv5a(h)))
h = self.relu(self.conv5b_bn(self.conv5b(h)))
h = self.pool5(h)

h = h.view(-1, 8192)
h = self.relu(self.fc6(h))
h = self.relu(self.fc7(h))
h = self.fc8(h)
return h

"""
References
----------
[1] Tran, Du, et al. "Learning spatiotemporal features with 3d convolutional networks." 
Proceedings of the IEEE international conference on computer vision. 2015.
[2] Ioffe, Surgey, et al. "Batch Normalization: Accelerating deep network training 
by reducing internal covariate shift."
arXiv:1502.03167v2 [cs.LG] 13 Feb 2015
"""

Let the 3D Convo power be with you! Happy coding! 🙂

Transfer Learning in ConvNets – Part 2

42-29421947We discussed the possibility of transferring the knowledge learned by a ConvNet to another. If you new to the idea of transfer learning, please go check up the previous post here.

Alright… Let’s see a practical scenario where we need to use transfer learning. We all know that deep neural networks are data hungry. We may need a huge amount of data to build unbiased predictive models. Though the perfect scenario is that, in most of the cases, there’s not that much of data to train neural models. So, the ‘To Go” survivor for you may be transfer learning.

Here in this small demonstration what I’ve done is building a multi-class classifier that have 8 classes and only 100 odd images in the training set for each class.

The dataset I’m using here is a derivation of the “Natural Images” dataset (https://www.kaggle.com/prasunroy/natural-images/version/1#_=_ )  . I’ve randomly reduced the number of images in the original dataset for building the “Mini Natural Images”. This dataset consists of three phases for train, test and validation.  (The dataset is available in the GitHub repository) Go ahead and feel free to pull it or fork it!

Here’s an overview of the “Mini Natural Images” dataset.

datasetSo, this is going to be an image classification task. We going to take the advantage of ImageNet; and the state-of-the-art architectures pre-trained on ImageNet dataset.  Instead of random initialization, we initialize the network with a pretrained network and the convNet is finetuned with the training set.

I’ve used PyTorch deep learning framework for the experiment as it’s super easy to adopt for deep learning.  For this type of computer vision applications you can use the models available in torch vision.models (https://pytorch.org/docs/stable/torchvision/models.html )

The models available in the model zoo is pre-trained with ImageNet dataset to classify 1000 classes. With that, there’s 1000 nodes in the final layer. For adopting the model for our need, keep in mind to remove the final layer and replace it with the desired number of nodes for your task. (In this experiment, the final fc layer of the resNet18 has been replaced by 8 node fc layer)

Here’s the way to replace the final layer of resNet architecture and in VGG architecture.

#Using a model pre-trained on ImageNet and replacing it's final linear layer

#For resnet18
model_ft = models.resnet18(pretrained=True)
num_ftrs = model_ft.fc.in_features
model_ft.fc = nn.Linear(num_ftrs, 8)

#for VGG16_BN
model_ft = models.vgg16_bn(pretrained=True)
model_ft.classifier[6].out_features = 8

Rest of the training goes in the same of training and finetuning a CNN. Make sure to use a desired batch size to your GPU available in your rig. (You can use a DLVM for this task if you wish 😊)

The training and validation accuracies are plotted and the confusion matrix is generated using torchnet (https://github.com/pytorch/tnt ) which is pretty good for visualization and logging in PyTorch.

g6_r18

Confusion matrix of the classification

The classifier performs a 97% accuracy for the testing image set, which is not bad.

Now it’s your time to go ahead and get your hands dirty with this experiment. Leave a comment if you find come up with any issue. Happy coding!

Here’s the GitHub Repo for your reference!