ml.azure.com – New Face of Azure Machine Learning

Azure Machine Learning Studio (preview) interface

Out of all the public cloud platforms, Microsoft Azure has adopted all most all the steps in machine learning life cycle into cloud. Though the resources and the abilities are there, sometimes finding the correct cloud-based product or service to adopt for your solution might be a problem.

Providing a perfect answer for that issue, Azure has come up with the whole new Azure Machine Learning Studio which is in the preview by the time am blogging this! (Don’t get confused with the AzureML Studio, the drag and drop interface we had before. This is a new thing – ml.azure.com ) There’s no framework dependency or restrictions for using these services. You can easily adapt your open source machine learning code base (may be written with Python, sci-kit learn, TensorFlow, PyTorch, Keras… anything)  

Most awesome feature of this new service is the AzureML python SDK (https://docs.microsoft.com/en-us/python/api/overview/azure/ml/intro) and AzureML R SDK (https://github.com/Azure/azureml-sdk-for-r) . These SDKs allow you to create and manage the ML experiments with your familiar coding style.  

In order to use this one-stop solution in Azure you may have to create an Azure Machine Learning Service from Azure portal. Then it’ll direct you for the new interface. You can either go for the Enterprise pricing tier or for the Basic tier. In basic tier you won’t get the visual designer and Automated ML features.

Launching the Studio through Azure portal

Let’s go through each and every tab we got in the side pane in the latest release and see what can we do with them.

Notebooks –

These are fully managed Jupyter notebook instances on cloud. These notebook servers are running on top of a new VM instance type called “notebookVM”. There notebookVMs are fully configured work environments to do your machine learning and data science tasks. No need to worry about installing all the python packages and its dependencies. All are already there! You have the privilege to change the notebook sizes (Yes GPU enabled VMs are also there) or install new packages through a python package manager too.

Automated ML –

Not available in the free tier though. A process of selecting the best suited algorithm for the dataset you are having. Right now, this is supporting classification, regression and time series forecasting for tabular data formats. Not supported for deep learning based computer vision applications. Deep learning based text analysis is also in preview. The Automated ML process runs a set of machine learning algorithms on top of your provided data and see which one gives the best accuracy metric. Good for building prototypes and even in some cases might be in production.

Designer –

This is the evolution of Azure ML Studio (Old drag and drop thing). Seems like Microsoft is going to end its lifecycle and give the new Designer be its replacement. Here you can build the complete ML workflow by dragging and dropping modules. If you want can integrate R or python scripts in the experiment. The machine learning service endpoint can be exposed through a Azure Kubernetes Service (AKS) deployment.

Datasets –

The place to manage and version your datasets. Datasets can be either tabular or file based. Here you can profile your dataset by performing a basic statistical analysis on your data. If your dataset is sitting on a datastore (which we going to discuss later) this acts as a high-level encapsulation of that data.

Experiments –

You may execute several runs on the same experiment with different configurations. This is the place where you can see all the log files of them and compare the runs with each other.

Pipelines –

Don’t get confused Azure machine learning pipelines with the Azure pipelines.  Azure ML pipelines are specifically designed for MLOps tasks. You can manage the whole experiment process till production using ML pipelines. These pipelines are reusable and help collaborative development of the solution.

Models –

You can register the trained ML models here. Versioning the models, managing which model to go for production are some use cases of this model registry. You can register models that has been trained outside the particular Azure ML workspace too here.

Endpoints –

Endpoint of an azure machine learning experiment can be a web service or an IoT module endpoint. Managing the endpoint keys etc. are performed in this section.

Compute –

In most of the cases, you may use Azure for computations. Here in the Compute section you can create and manage the following compute resource types.

  • Notebook VMs – as we discussed previously in the Notebooks section this is a fully managed ML development environment suits for development and prototyping purposes.
  • Training clusters – You can make a either a CPU based or a GPU based cluster for running your experiments. Note that this would be charged according to the computation hours as well as for the number of nodes you are using. Good thing is there’s no charge when you are not using the cluster for computation.
  • Inference clusters – This talk about AKS clusters where you can deploy the endpoints. Even you can register a prevailing AKS cluster as an inference cluster.
  • Attached compute – If you working with Azure Databricks, Data Lake Analytics or HDInsight you can configure the computation here. In an interesting use case if you want to attach your physical computer (which should be a workstation running Ubuntu) as a compute target it’s also possible through the AML service.

Datastores –

When it comes to machine learning experiments its normal to have large amount of data. These data may sit in your Azure storage. Datastore is the storage abstraction over an Azure storage account which then you can use inside your machine learning experiments.

Data labeling –

A cool new feature for data annotators. Right now, this supports Image classification in multi-label / multi-class and object identification (bounding box) annotations. The annotator should not have to have an Azure subscription. You can easily outsource your tedious annotation workload through this feature.

This is just an overview of the options we are having with new Azure Machine Learning Studio. It’s pretty clear that Azure team is going to get all the ML related services under one umbrella. Let’s discuss some cool use cases and tips on using these services in next blog posts.

Happy coding! 😊        

PyTorch Custom Dataset Tips and Tricks

Loading massive and complex datasets for training deep learning models has become a normal practice in most of the deep learning experiments. Handling large datasets which contain multimedia such as images, video frames and sound clips etc. can’t be perform just with simple file open commands which drastically reduce the model training efficiency.

Featuring a more pythonic API, PyTorch deep learning framework offers a GPU friendly efficient data generation scheme to load any data type to train deep learning models in a more optimal manner.

Based on the Dataset class (torch.utils.data.Dataset) on PyTorch you can load pretty much every data format in all shapes and sizes by overriding two subclass functions.

 __len__  – returns the size of the dataset

__getitem__  – returns a sample from the dataset given an index.

Here’s a rough skeleton of the Dataset class which you can modify for your need.

import torch
from torch.utils.data.dataset import Dataset

#If available use GPU memory to load data 
use_cuda = torch.cuda.is_available()
device = torch.device("cuda:0" if use_cuda else "cpu")


class MyCustomDataset(Dataset):
    def __init__(self, ...):
        # # All the data preperation tasks can be defined here
        # - Deciding the dataset split (train/test/ validate)
        # - Data Transformation methods 
        # - Reading annotation files (CSV/XML etc.)
        # - Prepare the data to read by an index
        
    def __getitem__(self, index):
        # # Returns data and labels
        # - Apply initiated transformations for data
        # - Push data for GPU memory
        # - better to return the data points as dictionary/ tensor  
        return (img, label)

    def __len__(self):
        return count # of how many examples(images?) you have

These are some tips and tricks I follow when writing custom dataloaders for PyTorch.

  • Datasets will expand with more and more samples and, therefore, we do not want to store too many tensors in memory at runtime in the Dataset object. Instead, we will form the tensors as we iterate through the samples list. This approach may be bit slow in processing but save us from going out of memory.
  • __init__ function should be the place where all the initial data preparations and logics happens. Do the operations where you may need to read data annotation files (CSV/XML etc.) here.
  • If you have separate portions of the dataset for train/test and validate, make sure you define that logic inside __init__ function. You can pass the desired data split as an argument for the function.
  • __init__ function is the place where you can define the data transformations. For an example, if you have image data to load and need to do resize and normalize images you can use torchvision transforms here.
#Example transform for image data
self.transform = transforms.Compose([transforms.Resize((224,224)), 
                                             transforms.ToTensor(),
                                             transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])])
  • Make sure you index your custom dataset in a relational structure when initiating. Generating an array or a list of the datapoints is a better way to do it.
  • __len__ function comes handy to see how many data points has been loaded through init. The data length is normally the number of records loaded into the final list or array you created inside __init__ .  
  • __getitem__ function should be light weight. Avoid using too complex computations inside __getitem__ function. 
  • PyTorch DataLoaders just call __getitem__() and wrap them up a batch when performing training or inferencing. So, this function is iterative. Make sure you return one datapoint at a time.
  • Always try to return the values from __getitem__ as tensors.
  • If you have multiple components to return from the DataLoader, using a Python dictionary is a handy option. You can structure it as key value pairs in the dictionary. Here’s an example dictionary item which contains four values in it.  
item = {
         'video_id' : video_id,
          'activity_id' :activity_id,
          'activity_frame': activity_frame_as_tensor,
          'activity_annotation' : activity_annotation
        }

Consuming the dataset –

You should create a CustomDataset object when you need to consume the data. This is a sample code snippet that demonstrate how to access the data points through the custom dataloader you created.

#Consuming the dataset 

#creating the dataset object
dataset = MyCustomDataset(...)

#Randomly split dataset into trainset and the validation set 
train_data, val_data = random_split(dataset, [50000, 10000])

#Create DataLoader iterators
train_loader = DataLoader(train_data, batch_size=64, shuffle=True, num_workers=2)
val_loader = DataLoader(val_data, batch_size=64, shuffle=True, num_workers=2)

#Iterating through the data loader object
for i, batch in enumerate(train_loader):
    print(i, batch)

You may notice, the dataLoader iterator can be batched, shuffled and load the data using multiprocessing just by changing the parameters in the function. Make sure you choose a batch size which fits with your memory capacity. If you loading the data to the GPU, it’s the GPU memory you should consider on.

If you using a multi-GPU setup with PyTorch dataloaders, it tries to divide the data batches evenly among the GPUs. If the batch size is less than the number of GPUs you have, it won’t utilize all GPUs.

I would say CustomDataset and DataLoader combo in PyTorch has become a life saver in most of complex data loading scenarios for me. Would love to hear from you on the experiences you have with writing Custom DataLoaders in PyTorch.

Happy Coding!  

Tensorboard with PyTorch

tb_1

Tensorboard Interface

Training and evaluating deep learning models may take a lot of time. Sometimes it’s worth to monitor how good or bad the model is training in real-time. It’ll help to understand, debug and optimize your models without waiting till the model get trained to monitor the performance.The good old method of printing out training losses / accuracy for each epoch is a good idea, but it’s bit hard to evaluate the metrics comparatively with that.

A real-time graphical interface that can use to plot/ visualize metrics while a model is training through epochs or iterations would be the best option. Tensorboard is visualization tool came out with TensorFlow and I’m pretty sure almost all TF guys are using and getting the advantage from that cool tool.

So what about PyTorchians?? Don’t panic. Official PyTorch repository recently came up with Tensorboard utility on PyTorch 1.1.0 . Still the code is experimental and for me it was not working well for me.

Then, I found this awesome opensource project, tensorboardX. Pretty similar to what PyTorch official repo is having and easy to work with. TensorboardX supports scalar, image, figure, histogram, audio, text, graph, onnx_graph, embedding, pr_curve and video summaries.

5 simple steps…

  1. Install tensorboardX
  2. Import tensorboardX for your PyTorch code
  3. Create a SummaryWriter object
  4. Define SummaryWriter
  5. Use it!

I just did a simple demo on this by adding Tensorboard logs for the famous PyTorch transfer learning tutorial. Here’s the GiHub repo. Just clone and play around it.

Note that in the experiment I’ve used two SummaryWriter objects two create two scalar graphs for training phase and the other one for validation phase.

The log files will be created in the directory you specified when creating SummaryWriter object. (You can change this directory to wherever you want)

To view the tensorboard, open a terminal inside the experiment folder. Assume that your log files are inside ‘./logs/’ . Use the following command to spin up the tensorboard server on your local machine.

$ tensorboard –logdir ./logs/

Sometimes you may use a remote server or a VM (might be a Azure DLVM) for training your deep learning models. Then how to get this tensorboard out from there??

SSH Tunneling with post forwarding is a good option you can use for this. You just have to spin up the tensorboard service on your remote machine. Then tunnel the server back to your workstation with the ssh command stated below.

$ ssh -N -L 6007:127.0.0.1:6006 <username>@<remote_ip>

127.0.0.1:6006 : Tensorboard server running on the remote server / VM

6007 : local workstation port

You can then view the tensorboard running on the remote machine through your local machine’s browser.

http://<remote_ip>:6006

That’s it! Simple and neat. No need to wait couple of days till the model get trained. Just monitor and stop early if it’s not learning well.

Enjoy Deep Learning!

GPU Accelerated Application Deployment with NVIDIA-Docker

When it comes to deep learning model development and training, personally for me, the majority of the time is spent on data pre-processing, then for setting up the development environment.  Cloud based development environments such as Azure DLVM, Google CoLab etc. are very good options to go with when you don’t have much time to spend on installing all the required packages for your workstation. But, there are times that we want to do the development on our machines and train/deploy in another place (may be on the client’s environment, for a machine with a better GPU for faster training or to train on a Kubernetes cluster). Docker comes handy in these scenarios.

Docker provides both hardware and software encapsulation by allowing portable deployment. If you are a data scientist/ machine learning guy or a deep learning developer, I strongly recommend you to give it a try with docker and I’m pretty sure that’ll make your life so easy.

Alright! That’s about docker! Let’s assume now you are using docker for deploying your deep learning applications and you want to use docker to ship your deep learning model to a remote computer that is having a powerful GPU, which allows you to use large mini-batch sizes and speedup your training process. Though docker containers solve the problem of framework dependencies and platform dependencies it is also hardware-agnostic. This creates a problem!

Have you ever tried to access the GPU resource on the host computer from a program running inside a docker container? Sadly, Docker does not natively support NVIDIA GPUs within containers.

The early work around was installing the Nvidia drivers inside the docker container. It’s bit of a hassle as the driver version installed in the container should match the driver on the host.

For making docker images that uses GPU resources more portable, Nvidia has introduced nvidia-docker!

nvidia-docker

NVIDIA-Docker plugin enables GPU accelerated application deployment

Nvidia-docker is a wrapper around the docker command that mounts the GPU on the host machine with the docker container. The only thing you should pay your attention is the CUDA version you want to use.

So, in which scenarios you can use this? In my case, nvidia-docker comes handy for me when I’m running my experiment on a cluster which is having a higher GPU power. What I do is just containerize all my code into a docker and run on the remote with nvidia-docker. (Windows guys… nvidia-docker is not still available for windows hosts. Not sure if that is in the development timeline or not 😀 )

Here’s the official GitHub on nvidia-docker. Just install it at make sure to restart your docker engine and make sure nvidia-docker the default docker run-time. Then rest is the same as building and running a typical docker.

Here’s a simple docker file I wrote for containerizing my PyTorch code. I’ve used CUDA 9.1.  You can modify this for your need.

FROM nvidia/cuda:9.1-base-ubuntu16.04

# Install some basic utilities
RUN apt-get update && apt-get install -y \
curl \
ca-certificates \
sudo \
git \
bzip2 \
libx11-6 \
&& rm -rf /var/lib/apt/lists/*

# Create a working directory
RUN mkdir /app
WORKDIR /app

# Create a non-root user and switch to it
RUN adduser --disabled-password --gecos '' --shell /bin/bash user \
&& chown -R user:user /app
RUN echo "user ALL=(ALL) NOPASSWD:ALL" > /etc/sudoers.d/90-user
USER user

# All users can use /home/user as their home directory
ENV HOME=/home/user
RUN chmod 777 /home/user

# Install Miniconda
RUN curl -so ~/miniconda.sh https://repo.continuum.io/miniconda/Miniconda3-4.5.1-Linux-x86_64.sh \
&& chmod +x ~/miniconda.sh \
&& ~/miniconda.sh -b -p ~/miniconda \
&& rm ~/miniconda.sh
ENV PATH=/home/user/miniconda/bin:$PATH
ENV CONDA_AUTO_UPDATE_CONDA=false

# Create a Python 3.6 environment
RUN /home/user/miniconda/bin/conda install conda-build \
&& /home/user/miniconda/bin/conda create -y --name py36 python=3.6.5 \
&& /home/user/miniconda/bin/conda clean -ya
ENV CONDA_DEFAULT_ENV=py36
ENV CONDA_PREFIX=/home/user/miniconda/envs/$CONDA_DEFAULT_ENV
ENV PATH=$CONDA_PREFIX/bin:$PATH

# Install PyTorch with Cuda 9.1 support
RUN conda install -y -c pytorch \
cuda91=1.0 \
magma-cuda91=2.3.0 \
pytorch=0.4.0 \
torchvision=0.2.1 \
&& conda clean -ya
RUN conda install opencv

# Install other dependencies from pip 
#My requirments.txt file jsut contains the following packages I used for the code. Change this for your need.
#numpy==1.14.3
#torch==0.4.0
#torchvision==0.2.1
#matplotlib==2.2.2
#tqdm==4.28.1
COPY requirements.txt .
RUN pip install -r requirements.txt

# Create /data directory so that a container can be run without volumes mounted
RUN sudo mkdir /data && sudo chown user:user /data

# Copy source code into the image
COPY --chown=user:user . /app

# Set the default command to python3
CMD ["python3"]

Here’s the bash command used for running the docker using the Nvidia run-time.

# 1. Build image
docker build .

# 2. Run the docker image
docker run \
--runtime=nvidia -it -d \
--rm <dockerImage> python3 <yourCode.py>

 

Just try it and see how your deep learning life becomes easy! Happy coding! 🙂

Achieving Super Convergence of DNNs with 1cycle Policy

I would say, training a deep neural network model to achieve a good accuracy is an art. The training process enable the model to learn the model parameters such as the weights and the biases with the training data. In the process of training, model hyper-parameters govern the process. They control the behavior of model training and does a significant impact on model accuracy and convergence.

Learning rate, number of epochs, hidden layers, hidden units, activation functions, momentum are the hyperparameters that we can adjust to make the neural network models perform well.

Adjusting the learning rate is a vital factor for convergence because a small learning rate makes the training very slow and can occur overfitting, while if the learning rate is too large, the training will diverge. The typical way of finding the optimum learning rate is performing a grid search or a random search which can be computationally expensive and take a lot of time. Isn’t there a smart way to find out the optimal learning rate?

Here I’m going to connect some dots together on a process I followed to choose a good learning rate for my model and a way of training a DNN with different learning rate policy.

Many researchers actively work on this area and through his paper “Cyclical Learning Rates for Training Neural Networks” by Leslie N. Smith proposed Learning rate range test (LR range test) and Cyclical Learning Rates (CLR).

Not going to discuss the interesting theory behind LR range test and CLR, as fast.ai has a pretty good introduction on the method and they even have an implementation of LR range test that can use off the shelf. Strongly recommend to read this post. I  found a nice implementation on LR range test in PyTorch by David Silva and feel free to pull it from here . https://github.com/davidtvs/pytorch-lr-finder

In 2018, by the paper “A disciplined Approach to Neural Network Hyper-Parameters : Part 1 – Learning Rate, Batch Size, Momentum, and Weight Decay” Smith introduces the 1cycle policy which is only running a single cycle of training compared to several cycles in the CLR. Strongly suggest to take a look on this blog post to get an idea on 1cycle policy.

Ok… Now you read it! Is this working???

I give it a try using a simple transfer learning experiment. The dataset and the experiment I used here is from the PyTorch documentation which you can find here.  These are the steps I followed during the experiment.

Yeah! I’ve pushed the experiment to GitHub and feel free to use it. 😊

  1. Run the LR range finder to find the maximum learning rate value to use on 1cycle learning.

lr_finder_output

Output from the LR finder

According to the graph it is clear that 5*1e-3 can be the maximum learning rate value that can be used for training. So, I chose 7*1e-3; which is bit before the minimum as my maximum learning rate for training.

  1. Run the training using a defined learning rate (Note that a learning rate decay has used during training)
  2. Run the training according to the 1cycle policy. (A cyclical momentum and cyclical learning date have been used. Note that the learning rate and the momentum is changing in each mini-batch: not epoch-wise.)

1cy

  1. Compare the validation accuracy and validation loss of each method.

Can you notice that the green line, which represents the experiment trained using 1cycle policy gives a better validation accuracy and a better validation loss when converging.

These are the best validation accuracy of the two experiments.

  • Fixed LR : 0.9411
  • 1-cycle : 0.9607

Tip : Use the batch size according to the computational capacity you are having. The number of iterations in 1cycle policy depends on the batch size, number of epochs and the dataset size you are using for training.

Though this experiment is a simple one, it is proven that 1cycle policy does a job in increasing the accuracy of neural network models and helps for super convergence. Give it a try and don’t forget to share your experiences here. 😊

References – 

[1] Cyclical Learning Rates for Training Neural Networks
https://arxiv.org/abs/1506.01186

[2] A disciplined approach to neural network hyper-parameters: Part 1 — learning rate, batch size, momentum, and weight decay
https://arxiv.org/abs/1803.09820

[3] The 1cycle policy
https://sgugger.github.io/the-1cycle-policy.html

[4] PyTorch Learning Rate Finder
https://github.com/davidtvs/pytorch-lr-finder

[5] Tranfer Learning Tutorial
https://pytorch.org/tutorials/beginner/transfer_learning_tutorial.html

C3D with Batch Normalization for Video Classification

Convolutional Neural Networks (CNNs) are well known for its ability to understand the spatial and positional features. 2D convolutional networks and widely used in computer vision related tasks. There are plenty of research happened and on going with 2D CNNs and the famous ImageNet challenge has gained an accuracy even better than humans!

Research teams have introduced several network architectures for solving the problem of image classification and related computer vision tasks.  LeNet(1998), AlexNet(2012), VGGNet(2014), GoogleNet(2014), ResNet(2015) are some of the famous CNN architectures in use now.  (I’ve discussed about using pre-trained models to perform transfer learning with these architectures here. Take a look. 🙂 )

1_ZqkLRkMU2ObOQWIHLBg8sw

It was all about 2D images. Then what about videos? 3D convolutions which applies a 3D kernel to the data and the kernel moves 3-directions (x, y and z) to calculates the feature representations is helpful in video event detection related tasks.

Same as in the area of 2D CNN architectures, researchers have introduced CNN architectures that are having 3D convolutional layers. They are performing well in video classification, event detection tasks. Some of these architectures have been adopted from the prevailing 2D CNN models by introducing 3D layers for them.

jriyCTU

A 3D Convo operation

Tran et al. from Facebook AI Research introduced the C3D model to learn spatiotemporal features in videos using 3D convolutional Networks.This is the paper : “Learning Spatiotemporal Features with 3D Convolutional Networks In the original paper they have used Dropout to regularize the network.

Instead of using dropout, I tried using Batch Normalization to regularize the network. Each convolutional layer id followed by a 3D batch normalization layer. With batch normalization, you can use bit bigger learning rates to train the network and it allows each layer of the network to learn by itself a little bit more independently from other layers.

This is just the PyTorch porting for the network. I use this network for video classification tasks which each video is having 16 RGB frames with the size of 112×112 pixels. So the tensor given as the input is (batch_size, 3, 16, 112, 112) . You can select the batch size according to the computation capacity you have.

import torch.nn as nn

class C3D_BN(nn.Module):
"""
 The C3D network as described in [1]
 Batch Normalization as described in [2]

 """

def __init__(self):
super(C3D_BN, self).__init__()

self.conv1 = nn.Conv3d(3, 64, kernel_size=(3, 3, 3), padding=(1, 1, 1))
self.conv1_bn = nn.BatchNorm3d(64)
self.pool1 = nn.MaxPool3d(kernel_size=(1, 2, 2), stride=(1, 2, 2))

self.conv2 = nn.Conv3d(64, 128, kernel_size=(3, 3, 3), padding=(1, 1, 1))
self.conv2_bn = nn.BatchNorm3d(128)
self.pool2 = nn.MaxPool3d(kernel_size=(2, 2, 2), stride=(2, 2, 2))

self.conv3a = nn.Conv3d(128, 256, kernel_size=(3, 3, 3), padding=(1, 1, 1))
self.conv3a_bn = nn.BatchNorm3d(256)
self.conv3b = nn.Conv3d(256, 256, kernel_size=(3, 3, 3), padding=(1, 1, 1))
self.conv3b_bn = nn.BatchNorm3d(256)
self.pool3 = nn.MaxPool3d(kernel_size=(2, 2, 2), stride=(2, 2, 2))

self.conv4a = nn.Conv3d(256, 512, kernel_size=(3, 3, 3), padding=(1, 1, 1))
self.conv4a_bn = nn.BatchNorm3d(512)
self.conv4b = nn.Conv3d(512, 512, kernel_size=(3, 3, 3), padding=(1, 1, 1))
self.conv4b_bn = nn.BatchNorm3d(512)
self.pool4 = nn.MaxPool3d(kernel_size=(2, 2, 2), stride=(2, 2, 2))

self.conv5a = nn.Conv3d(512, 512, kernel_size=(3, 3, 3), padding=(1, 1, 1))
self.conv5a_bn = nn.BatchNorm3d(512)
self.conv5b = nn.Conv3d(512, 512, kernel_size=(3, 3, 3), padding=(1, 1, 1))
self.conv5b_bn = nn.BatchNorm3d(512)
self.pool5 = nn.MaxPool3d(kernel_size=(2, 2, 2), stride=(2, 2, 2), padding=(0, 1, 1))

self.fc6 = nn.Linear(8192, 4096)
self.fc7 = nn.Linear(4096, 4096)
self.fc8 = nn.Linear(4096, 8)
self.relu = nn.ReLU()

def forward(self, x):

h = self.relu(self.conv1_bn(self.conv1(x)))
h = self.pool1(h)

h = self.relu(self.conv2_bn(self.conv2(h)))
h = self.pool2(h)

h = self.relu(self.conv3a_bn(self.conv3a(h)))
h = self.relu(self.conv3b_bn(self.conv3b(h)))
h = self.pool3(h)

h = self.relu(self.conv4a_bn(self.conv4a(h)))
h = self.relu(self.conv4b_bn(self.conv4b(h)))
h = self.pool4(h)

h = self.relu(self.conv5a_bn(self.conv5a(h)))
h = self.relu(self.conv5b_bn(self.conv5b(h)))
h = self.pool5(h)

h = h.view(-1, 8192)
h = self.relu(self.fc6(h))
h = self.relu(self.fc7(h))
h = self.fc8(h)
return h

"""
References
----------
[1] Tran, Du, et al. "Learning spatiotemporal features with 3d convolutional networks." 
Proceedings of the IEEE international conference on computer vision. 2015.
[2] Ioffe, Surgey, et al. "Batch Normalization: Accelerating deep network training 
by reducing internal covariate shift."
arXiv:1502.03167v2 [cs.LG] 13 Feb 2015
"""

Let the 3D Convo power be with you! Happy coding! 🙂

Transfer Learning in ConvNets – Part 2

42-29421947We discussed the possibility of transferring the knowledge learned by a ConvNet to another. If you new to the idea of transfer learning, please go check up the previous post here.

Alright… Let’s see a practical scenario where we need to use transfer learning. We all know that deep neural networks are data hungry. We may need a huge amount of data to build unbiased predictive models. Though the perfect scenario is that, in most of the cases, there’s not that much of data to train neural models. So, the ‘To Go” survivor for you may be transfer learning.

Here in this small demonstration what I’ve done is building a multi-class classifier that have 8 classes and only 100 odd images in the training set for each class.

The dataset I’m using here is a derivation of the “Natural Images” dataset (https://www.kaggle.com/prasunroy/natural-images/version/1#_=_ )  . I’ve randomly reduced the number of images in the original dataset for building the “Mini Natural Images”. This dataset consists of three phases for train, test and validation.  (The dataset is available in the GitHub repository) Go ahead and feel free to pull it or fork it!

Here’s an overview of the “Mini Natural Images” dataset.

datasetSo, this is going to be an image classification task. We going to take the advantage of ImageNet; and the state-of-the-art architectures pre-trained on ImageNet dataset.  Instead of random initialization, we initialize the network with a pretrained network and the convNet is finetuned with the training set.

I’ve used PyTorch deep learning framework for the experiment as it’s super easy to adopt for deep learning.  For this type of computer vision applications you can use the models available in torch vision.models (https://pytorch.org/docs/stable/torchvision/models.html )

The models available in the model zoo is pre-trained with ImageNet dataset to classify 1000 classes. With that, there’s 1000 nodes in the final layer. For adopting the model for our need, keep in mind to remove the final layer and replace it with the desired number of nodes for your task. (In this experiment, the final fc layer of the resNet18 has been replaced by 8 node fc layer)

Here’s the way to replace the final layer of resNet architecture and in VGG architecture.

#Using a model pre-trained on ImageNet and replacing it's final linear layer

#For resnet18
model_ft = models.resnet18(pretrained=True)
num_ftrs = model_ft.fc.in_features
model_ft.fc = nn.Linear(num_ftrs, 8)

#for VGG16_BN
model_ft = models.vgg16_bn(pretrained=True)
model_ft.classifier[6].out_features = 8

Rest of the training goes in the same of training and finetuning a CNN. Make sure to use a desired batch size to your GPU available in your rig. (You can use a DLVM for this task if you wish 😊)

The training and validation accuracies are plotted and the confusion matrix is generated using torchnet (https://github.com/pytorch/tnt ) which is pretty good for visualization and logging in PyTorch.

g6_r18

Confusion matrix of the classification

The classifier performs a 97% accuracy for the testing image set, which is not bad.

Now it’s your time to go ahead and get your hands dirty with this experiment. Leave a comment if you find come up with any issue. Happy coding!

Here’s the GitHub Repo for your reference!

Transfer Learning in ConvNets

42-29421947The rise of deep learning methods in the areas like computer vision and natural language processing lead to the need of having massive datasets to train the deep neural networks. In most of the cases, finding large enough datasets is not possible and training a deep neural network from the scratch for a particular task may time consuming. For addressing this problem, transfer learning can be used as a learning framework; where the knowledge acquired from a learned related task is transferred for the learning improvement of a new task.

In a simple form, transfer learning helps a machine learning models to learn easily by getting the help from a pre-trained machine learning model which the domain is similar to some extent (not exactly similar).

t1

The ways in which transfer might improve learning

There might be cases where transfer method actually decreases the performance, where we called them as a Negative Transfer. Normally, we (a human) engage with the task of deciding which knowledge can be transferred (mapping) in particular tasks but the active research is going on finding ways to do this mapping automatically.

That’s all about the theories! Let’s discuss how we can apply transfer learning in a computer vision task. As you all know, Convolutional Neural Networks (CNNs) is performing really well in the cases of image classification, image recognition and such tasks. Training deep CNNs need large amounts of image/video data and the massive number of parameter tuning operations takes a long time to train models. In such cases, Transfer Learning is a best fit to train new models and it is widely used in the industry as well as in the research.

There are three main approaches of using transfer learning in machine learning problems. To make it easier to understand I’ll get my examples from the context of training deep neural network models for computer vision (image classification, labeling etc.) related tasks.

ConvNet as fixed feature extractor –

In this case, you use a ConvNet that has been pre-trained with a large image repository like ImageNet and remove its last fully connected layer. The rest is used as a fixed feature extractor for the dataset you are having. Then a linear classifier (softmax or a linear SVM) should be trained for the new dataset.

t2

VGG16 as a feature extractor

Fine-tuning the ConvNet –

Here we are not just stopping by using the ConvNet as a feature extractor. We finetune the weights of the ConvNet with the data that we are having. Sometimes not the whole deepNet, the set of last layers are tuned as the first layers represent most generalized features.

Using pretrained models –

In here we used pre-trained models available in most deep learning frameworks and adjust them according to our need. In the next post, will discuss how to perform this using PyTorch.

One of the most important decisions to get in transfer learning is whether to fine tune the network or to leave it as it is. The size of the dataset and the similarity of the prevailing dataset to the model’s trained training set are the deciding factors for it.  Here’s a summary that would help you to take the decision.

Picture1Let’s discuss how to perform transfer learning with an example in the next post. 😊

The Story of Deep Pan Pizza :AI Explained for Dummies

Artificial Intelligence, Machine Learning, Neural Networks, Deep Learning….

Most probably, the words on the top are the widely used and widely discussed buzz words today. Even the big companies use them to make their products appear more futuristic and “market candy” (Like a ‘tech giant’ recently introduced something called a ‘neural engine’)!

Though AI and related buzz words are so much popular, still there are some misconceptions with people on their definitions. One thing that clearly you should know is; AI, machine learning & deep learning is having a huge deviation from the field called “Big Data”. It’s true that some ML & DL experiments are using big data for training… but keep in mind that handling big data and doing operations with big data is a separate discipline.

So, what is Artificial Intelligence?

“Artificial intelligence, sometimes called machine intelligence, is intelligence demonstrated by machines, in contrast to the natural intelligence displayed by humans and other animals.” – Wikipedia

Simple as that. If a system has been developed to perform the tasks that need human intelligence such as visual perception, speech recognition, decision making… these systems can be defined as a intelligent system or an so called AI!

The most famous “Turing Test” developed by Alan Turing (Yes. The Enigma guy in the Imitation Game movie!) proposed a way to evaluate the intelligent behavior of an AI system.

Turing_test_diagram

Turing Test

There are two closed rooms… let’s say A & B. in the room A… we have a human while in the room B we have a system. The interrogator; person C is given the task to identify in which room the human is. C is limited to use written questions to make the determination. If C fails to do it- the computer in room A can be defined as an AI! Though this test is not so valid for the intelligent systems we have today, it gives a basic idea on what AI is.

Then Machine Learning?

Machine learning is a sub component of AI, that consists of methods and algorithms allows the computer systems to statistically learn the patterns of data. Isn’t that statistics? No. Machine learning doesn’t rely on rule based programming (It means that a If-Else ladder is not ML 😀 ) where statistical modeling is mostly about formulation of relationships between data in the form of mathematical equations.

There are many machine learning algorithms out there. SVMs, decision trees, unsupervised methods like K-mean clustering and so-called neural networks.

That’s ma boy! Artificial Neural Networks?

Inspired by the neural networks we all have inside our body; artificial neural network systems “learn” to perform tasks by considering many examples. Simply, we show a thousand images of cute cats to a ANN and next time.. when the ANN sees a cat he is gonna yell.. “Hey it seems like a cat!”.

If you wanna know all the math and magic behind that… just Google! Tons of resources there.

Alright… then Deep Learning?

Yes! That’s deep! Imagine the typical vanilla neural networks as thin crust pizza… It’s having the input layer (the crust), one or two hidden layers (the thinly soft part in the middle) and the output layer (the topping). When it comes to Deep Learning or the deep neural networks, that’s DEEP PAN PIZZA!

e8f6eaa267ef4b02b2734d0031767728_th

DNNs are just like Deep Pan Pizzas

Deep Neural Networks consist of many hidden layers between the input layer and the output layer. Not only typical propagation operations, but also some add-ins (like pineapple) in the middle. Pooling layers, activation functions…. MANY!

So, the CNNs… RNNs…

You can have many flavors in Deep Pan Pizzas! Some are good for spicy lovers… some are good for meat lovers. Same with Deep Neural Networks. Many good researchers have found interesting ways of connecting the hidden layers (or baking the yummy middle) of DNNs. Some of them are very good in image interpretation while others are good in predicting values that involves time or the state. Convolutional Neural Networks, Recurrent Neural Networks are most famous flavors of this deep pan pizzas!

These deep pan pizzas have proven that they are able to perform some tasks with close-to-human accuracy and even sometimes with a higher accuracy than humans!deep-learning

Don’t panic! Robots would not invade the world soon…

 

Image Courtesy : DataScienceCentral | Wikipedia

Deep Learning Vs. Traditional Computer Vision

1_K68boX7fmtsYmyG2LlcmhQ

Traditional way of feature extraction

Computer vision can be succinctly described as finding and telling features from images to help discriminate objects and/or classes of objects.

Computer vision has become one of the vital research areas and the commercial applications bounded with the use of computer vision methodologies is becoming a huge portion in industry. The accuracy and the speed of processing and identifying images captured from cameras are has developed through decades. Being the well-known boy in town, deep learning is playing a major role as a computer vision tool.

Is deep learning the only tool to perform computer vision?

A big no! Deep learning came to the scene of computer vision couple of years back. Back then, computer vision was mainly based with image processing algorithms and methods. The main process of computer vision was extracting the features of the image. Detecting the color, edges, corners and objects were the first step to do when performing a computer vision task. These features are human engineered and accuracy and the reliability of the models directly depend on the extracted features and on the methods used for feature extraction. In the traditional vision scope, the algorithms like SIFT (Scale-Invariant Feature Transform), SURF (Speeded-Up Robust Features), BRIEF (Binary Robust Independent Elementary Features) plays the major role of extracting the features from the raw image.

The difficulty with this approach of feature extraction in image classification is that you have to choose which features to look for in each given image. When the number of classes of the classification goes high or the image clarity goes down it’s really hard to cope up with traditional computer vision algorithms.

The Deep Learning approach –

Deep learning, which is a subset of machine learning has shown a significant performance and accuracy gain in the field of computer vision. Arguably one of the most influential papers in applying deep learning to computer vision, in 2012, a neural network containing over 60 million parameters significantly beat previous state-of-the-art approaches to image recognition in a popular ImageNet computer vision competition: ISVRC-2012

Screen-Shot-2018-03-16-at-3.06.48-PMThe boom started with the convolutional neural networks and the modified architectures of ConvNets. By now it is said that some convNet architectures are so close to 100% accuracy of image classification challenges, sometimes beating the human eye!

The main difference in deep learning approach of computer vision is the concept of end-to-end learning. There’s no longer need of defining the features and do feature engineering. The neural do that for you. It can simply put in this way.

If you want to teach a [deep] neural network to recognize a cat, for instance, you don’t tell it to look for whiskers, ears, fur, and eyes. You simply show it thousands and thousands of photos of cats, and eventually it works things out. If it keeps misclassifying foxes as cats, you don’t rewrite the code. You just keep coaching it.

Wired | 2016

Though deep neural networks has its major drawbacks like, need of having huge amount of training data and need of large computation power, the field of computer vision has already conquered by this amazing tool already!