Achieving Super Convergence of DNNs with 1cycle Policy

I would say, training a deep neural network model to achieve a good accuracy is an art. The training process enable the model to learn the model parameters such as the weights and the biases with the training data. In the process of training, model hyper-parameters govern the process. They control the behavior of model training and does a significant impact on model accuracy and convergence.

Learning rate, number of epochs, hidden layers, hidden units, activation functions, momentum are the hyperparameters that we can adjust to make the neural network models perform well.

Adjusting the learning rate is a vital factor for convergence because a small learning rate makes the training very slow and can occur overfitting, while if the learning rate is too large, the training will diverge. The typical way of finding the optimum learning rate is performing a grid search or a random search which can be computationally expensive and take a lot of time. Isn’t there a smart way to find out the optimal learning rate?

Here I’m going to connect some dots together on a process I followed to choose a good learning rate for my model and a way of training a DNN with different learning rate policy.

Many researchers actively work on this area and through his paper “Cyclical Learning Rates for Training Neural Networks” by Leslie N. Smith proposed Learning rate range test (LR range test) and Cyclical Learning Rates (CLR).

Not going to discuss the interesting theory behind LR range test and CLR, as fast.ai has a pretty good introduction on the method and they even have an implementation of LR range test that can use off the shelf. Strongly recommend to read this post. I  found a nice implementation on LR range test in PyTorch by David Silva and feel free to pull it from here . https://github.com/davidtvs/pytorch-lr-finder

In 2018, by the paper “A disciplined Approach to Neural Network Hyper-Parameters : Part 1 – Learning Rate, Batch Size, Momentum, and Weight Decay” Smith introduces the 1cycle policy which is only running a single cycle of training compared to several cycles in the CLR. Strongly suggest to take a look on this blog post to get an idea on 1cycle policy.

Ok… Now you read it! Is this working???

I give it a try using a simple transfer learning experiment. The dataset and the experiment I used here is from the PyTorch documentation which you can find here.  These are the steps I followed during the experiment.

Yeah! I’ve pushed the experiment to GitHub and feel free to use it. 😊

  1. Run the LR range finder to find the maximum learning rate value to use on 1cycle learning.

lr_finder_output

Output from the LR finder

According to the graph it is clear that 5*1e-3 can be the maximum learning rate value that can be used for training. So, I chose 7*1e-3; which is bit before the minimum as my maximum learning rate for training.

  1. Run the training using a defined learning rate (Note that a learning rate decay has used during training)
  2. Run the training according to the 1cycle policy. (A cyclical momentum and cyclical learning date have been used. Note that the learning rate and the momentum is changing in each mini-batch: not epoch-wise.)

1cy

  1. Compare the validation accuracy and validation loss of each method.

Can you notice that the green line, which represents the experiment trained using 1cycle policy gives a better validation accuracy and a better validation loss when converging.

These are the best validation accuracy of the two experiments.

  • Fixed LR : 0.9411
  • 1-cycle : 0.9607

Tip : Use the batch size according to the computational capacity you are having. The number of iterations in 1cycle policy depends on the batch size, number of epochs and the dataset size you are using for training.

Though this experiment is a simple one, it is proven that 1cycle policy does a job in increasing the accuracy of neural network models and helps for super convergence. Give it a try and don’t forget to share your experiences here. 😊

References – 

[1] Cyclical Learning Rates for Training Neural Networks
https://arxiv.org/abs/1506.01186

[2] A disciplined approach to neural network hyper-parameters: Part 1 — learning rate, batch size, momentum, and weight decay
https://arxiv.org/abs/1803.09820

[3] The 1cycle policy
https://sgugger.github.io/the-1cycle-policy.html

[4] PyTorch Learning Rate Finder
https://github.com/davidtvs/pytorch-lr-finder

[5] Tranfer Learning Tutorial
https://pytorch.org/tutorials/beginner/transfer_learning_tutorial.html

C3D with Batch Normalization for Video Classification

Convolutional Neural Networks (CNNs) are well known for its ability to understand the spatial and positional features. 2D convolutional networks and widely used in computer vision related tasks. There are plenty of research happened and on going with 2D CNNs and the famous ImageNet challenge has gained an accuracy even better than humans!

Research teams have introduced several network architectures for solving the problem of image classification and related computer vision tasks.  LeNet(1998), AlexNet(2012), VGGNet(2014), GoogleNet(2014), ResNet(2015) are some of the famous CNN architectures in use now.  (I’ve discussed about using pre-trained models to perform transfer learning with these architectures here. Take a look. 🙂 )

1_ZqkLRkMU2ObOQWIHLBg8sw

It was all about 2D images. Then what about videos? 3D convolutions which applies a 3D kernel to the data and the kernel moves 3-directions (x, y and z) to calculates the feature representations is helpful in video event detection related tasks.

Same as in the area of 2D CNN architectures, researchers have introduced CNN architectures that are having 3D convolutional layers. They are performing well in video classification, event detection tasks. Some of these architectures have been adopted from the prevailing 2D CNN models by introducing 3D layers for them.

jriyCTU

A 3D Convo operation

Tran et al. from Facebook AI Research introduced the C3D model to learn spatiotemporal features in videos using 3D convolutional Networks.This is the paper : “Learning Spatiotemporal Features with 3D Convolutional Networks In the original paper they have used Dropout to regularize the network.

Instead of using dropout, I tried using Batch Normalization to regularize the network. Each convolutional layer id followed by a 3D batch normalization layer. With batch normalization, you can use bit bigger learning rates to train the network and it allows each layer of the network to learn by itself a little bit more independently from other layers.

This is just the PyTorch porting for the network. I use this network for video classification tasks which each video is having 16 RGB frames with the size of 112×112 pixels. So the tensor given as the input is (batch_size, 3, 16, 112, 112) . You can select the batch size according to the computation capacity you have.

import torch.nn as nn

class C3D_BN(nn.Module):
"""
 The C3D network as described in [1]
 Batch Normalization as described in [2]

 """

def __init__(self):
super(C3D_BN, self).__init__()

self.conv1 = nn.Conv3d(3, 64, kernel_size=(3, 3, 3), padding=(1, 1, 1))
self.conv1_bn = nn.BatchNorm3d(64)
self.pool1 = nn.MaxPool3d(kernel_size=(1, 2, 2), stride=(1, 2, 2))

self.conv2 = nn.Conv3d(64, 128, kernel_size=(3, 3, 3), padding=(1, 1, 1))
self.conv2_bn = nn.BatchNorm3d(128)
self.pool2 = nn.MaxPool3d(kernel_size=(2, 2, 2), stride=(2, 2, 2))

self.conv3a = nn.Conv3d(128, 256, kernel_size=(3, 3, 3), padding=(1, 1, 1))
self.conv3a_bn = nn.BatchNorm3d(256)
self.conv3b = nn.Conv3d(256, 256, kernel_size=(3, 3, 3), padding=(1, 1, 1))
self.conv3b_bn = nn.BatchNorm3d(256)
self.pool3 = nn.MaxPool3d(kernel_size=(2, 2, 2), stride=(2, 2, 2))

self.conv4a = nn.Conv3d(256, 512, kernel_size=(3, 3, 3), padding=(1, 1, 1))
self.conv4a_bn = nn.BatchNorm3d(512)
self.conv4b = nn.Conv3d(512, 512, kernel_size=(3, 3, 3), padding=(1, 1, 1))
self.conv4b_bn = nn.BatchNorm3d(512)
self.pool4 = nn.MaxPool3d(kernel_size=(2, 2, 2), stride=(2, 2, 2))

self.conv5a = nn.Conv3d(512, 512, kernel_size=(3, 3, 3), padding=(1, 1, 1))
self.conv5a_bn = nn.BatchNorm3d(512)
self.conv5b = nn.Conv3d(512, 512, kernel_size=(3, 3, 3), padding=(1, 1, 1))
self.conv5b_bn = nn.BatchNorm3d(512)
self.pool5 = nn.MaxPool3d(kernel_size=(2, 2, 2), stride=(2, 2, 2), padding=(0, 1, 1))

self.fc6 = nn.Linear(8192, 4096)
self.fc7 = nn.Linear(4096, 4096)
self.fc8 = nn.Linear(4096, 8)
self.relu = nn.ReLU()

def forward(self, x):

h = self.relu(self.conv1_bn(self.conv1(x)))
h = self.pool1(h)

h = self.relu(self.conv2_bn(self.conv2(h)))
h = self.pool2(h)

h = self.relu(self.conv3a_bn(self.conv3a(h)))
h = self.relu(self.conv3b_bn(self.conv3b(h)))
h = self.pool3(h)

h = self.relu(self.conv4a_bn(self.conv4a(h)))
h = self.relu(self.conv4b_bn(self.conv4b(h)))
h = self.pool4(h)

h = self.relu(self.conv5a_bn(self.conv5a(h)))
h = self.relu(self.conv5b_bn(self.conv5b(h)))
h = self.pool5(h)

h = h.view(-1, 8192)
h = self.relu(self.fc6(h))
h = self.relu(self.fc7(h))
h = self.fc8(h)
return h

"""
References
----------
[1] Tran, Du, et al. "Learning spatiotemporal features with 3d convolutional networks." 
Proceedings of the IEEE international conference on computer vision. 2015.
[2] Ioffe, Surgey, et al. "Batch Normalization: Accelerating deep network training 
by reducing internal covariate shift."
arXiv:1502.03167v2 [cs.LG] 13 Feb 2015
"""

Let the 3D Convo power be with you! Happy coding! 🙂

Transfer Learning in ConvNets – Part 2

42-29421947We discussed the possibility of transferring the knowledge learned by a ConvNet to another. If you new to the idea of transfer learning, please go check up the previous post here.

Alright… Let’s see a practical scenario where we need to use transfer learning. We all know that deep neural networks are data hungry. We may need a huge amount of data to build unbiased predictive models. Though the perfect scenario is that, in most of the cases, there’s not that much of data to train neural models. So, the ‘To Go” survivor for you may be transfer learning.

Here in this small demonstration what I’ve done is building a multi-class classifier that have 8 classes and only 100 odd images in the training set for each class.

The dataset I’m using here is a derivation of the “Natural Images” dataset (https://www.kaggle.com/prasunroy/natural-images/version/1#_=_ )  . I’ve randomly reduced the number of images in the original dataset for building the “Mini Natural Images”. This dataset consists of three phases for train, test and validation.  (The dataset is available in the GitHub repository) Go ahead and feel free to pull it or fork it!

Here’s an overview of the “Mini Natural Images” dataset.

datasetSo, this is going to be an image classification task. We going to take the advantage of ImageNet; and the state-of-the-art architectures pre-trained on ImageNet dataset.  Instead of random initialization, we initialize the network with a pretrained network and the convNet is finetuned with the training set.

I’ve used PyTorch deep learning framework for the experiment as it’s super easy to adopt for deep learning.  For this type of computer vision applications you can use the models available in torch vision.models (https://pytorch.org/docs/stable/torchvision/models.html )

The models available in the model zoo is pre-trained with ImageNet dataset to classify 1000 classes. With that, there’s 1000 nodes in the final layer. For adopting the model for our need, keep in mind to remove the final layer and replace it with the desired number of nodes for your task. (In this experiment, the final fc layer of the resNet18 has been replaced by 8 node fc layer)

Here’s the way to replace the final layer of resNet architecture and in VGG architecture.

#Using a model pre-trained on ImageNet and replacing it's final linear layer

#For resnet18
model_ft = models.resnet18(pretrained=True)
num_ftrs = model_ft.fc.in_features
model_ft.fc = nn.Linear(num_ftrs, 8)

#for VGG16_BN
model_ft = models.vgg16_bn(pretrained=True)
model_ft.classifier[6].out_features = 8

Rest of the training goes in the same of training and finetuning a CNN. Make sure to use a desired batch size to your GPU available in your rig. (You can use a DLVM for this task if you wish 😊)

The training and validation accuracies are plotted and the confusion matrix is generated using torchnet (https://github.com/pytorch/tnt ) which is pretty good for visualization and logging in PyTorch.

g6_r18

Confusion matrix of the classification

The classifier performs a 97% accuracy for the testing image set, which is not bad.

Now it’s your time to go ahead and get your hands dirty with this experiment. Leave a comment if you find come up with any issue. Happy coding!

Here’s the GitHub Repo for your reference!

Transfer Learning in ConvNets

42-29421947The rise of deep learning methods in the areas like computer vision and natural language processing lead to the need of having massive datasets to train the deep neural networks. In most of the cases, finding large enough datasets is not possible and training a deep neural network from the scratch for a particular task may time consuming. For addressing this problem, transfer learning can be used as a learning framework; where the knowledge acquired from a learned related task is transferred for the learning improvement of a new task.

In a simple form, transfer learning helps a machine learning models to learn easily by getting the help from a pre-trained machine learning model which the domain is similar to some extent (not exactly similar).

t1

The ways in which transfer might improve learning

There might be cases where transfer method actually decreases the performance, where we called them as a Negative Transfer. Normally, we (a human) engage with the task of deciding which knowledge can be transferred (mapping) in particular tasks but the active research is going on finding ways to do this mapping automatically.

That’s all about the theories! Let’s discuss how we can apply transfer learning in a computer vision task. As you all know, Convolutional Neural Networks (CNNs) is performing really well in the cases of image classification, image recognition and such tasks. Training deep CNNs need large amounts of image/video data and the massive number of parameter tuning operations takes a long time to train models. In such cases, Transfer Learning is a best fit to train new models and it is widely used in the industry as well as in the research.

There are three main approaches of using transfer learning in machine learning problems. To make it easier to understand I’ll get my examples from the context of training deep neural network models for computer vision (image classification, labeling etc.) related tasks.

ConvNet as fixed feature extractor –

In this case, you use a ConvNet that has been pre-trained with a large image repository like ImageNet and remove its last fully connected layer. The rest is used as a fixed feature extractor for the dataset you are having. Then a linear classifier (softmax or a linear SVM) should be trained for the new dataset.

t2

VGG16 as a feature extractor

Fine-tuning the ConvNet –

Here we are not just stopping by using the ConvNet as a feature extractor. We finetune the weights of the ConvNet with the data that we are having. Sometimes not the whole deepNet, the set of last layers are tuned as the first layers represent most generalized features.

Using pretrained models –

In here we used pre-trained models available in most deep learning frameworks and adjust them according to our need. In the next post, will discuss how to perform this using PyTorch.

One of the most important decisions to get in transfer learning is whether to fine tune the network or to leave it as it is. The size of the dataset and the similarity of the prevailing dataset to the model’s trained training set are the deciding factors for it.  Here’s a summary that would help you to take the decision.

Picture1Let’s discuss how to perform transfer learning with an example in the next post. 😊

The Story of Deep Pan Pizza :AI Explained for Dummies

Artificial Intelligence, Machine Learning, Neural Networks, Deep Learning….

Most probably, the words on the top are the widely used and widely discussed buzz words today. Even the big companies use them to make their products appear more futuristic and “market candy” (Like a ‘tech giant’ recently introduced something called a ‘neural engine’)!

Though AI and related buzz words are so much popular, still there are some misconceptions with people on their definitions. One thing that clearly you should know is; AI, machine learning & deep learning is having a huge deviation from the field called “Big Data”. It’s true that some ML & DL experiments are using big data for training… but keep in mind that handling big data and doing operations with big data is a separate discipline.

So, what is Artificial Intelligence?

“Artificial intelligence, sometimes called machine intelligence, is intelligence demonstrated by machines, in contrast to the natural intelligence displayed by humans and other animals.” – Wikipedia

Simple as that. If a system has been developed to perform the tasks that need human intelligence such as visual perception, speech recognition, decision making… these systems can be defined as a intelligent system or an so called AI!

The most famous “Turing Test” developed by Alan Turing (Yes. The Enigma guy in the Imitation Game movie!) proposed a way to evaluate the intelligent behavior of an AI system.

Turing_test_diagram

Turing Test

There are two closed rooms… let’s say A & B. in the room A… we have a human while in the room B we have a system. The interrogator; person C is given the task to identify in which room the human is. C is limited to use written questions to make the determination. If C fails to do it- the computer in room A can be defined as an AI! Though this test is not so valid for the intelligent systems we have today, it gives a basic idea on what AI is.

Then Machine Learning?

Machine learning is a sub component of AI, that consists of methods and algorithms allows the computer systems to statistically learn the patterns of data. Isn’t that statistics? No. Machine learning doesn’t rely on rule based programming (It means that a If-Else ladder is not ML 😀 ) where statistical modeling is mostly about formulation of relationships between data in the form of mathematical equations.

There are many machine learning algorithms out there. SVMs, decision trees, unsupervised methods like K-mean clustering and so-called neural networks.

That’s ma boy! Artificial Neural Networks?

Inspired by the neural networks we all have inside our body; artificial neural network systems “learn” to perform tasks by considering many examples. Simply, we show a thousand images of cute cats to a ANN and next time.. when the ANN sees a cat he is gonna yell.. “Hey it seems like a cat!”.

If you wanna know all the math and magic behind that… just Google! Tons of resources there.

Alright… then Deep Learning?

Yes! That’s deep! Imagine the typical vanilla neural networks as thin crust pizza… It’s having the input layer (the crust), one or two hidden layers (the thinly soft part in the middle) and the output layer (the topping). When it comes to Deep Learning or the deep neural networks, that’s DEEP PAN PIZZA!

e8f6eaa267ef4b02b2734d0031767728_th

DNNs are just like Deep Pan Pizzas

Deep Neural Networks consist of many hidden layers between the input layer and the output layer. Not only typical propagation operations, but also some add-ins (like pineapple) in the middle. Pooling layers, activation functions…. MANY!

So, the CNNs… RNNs…

You can have many flavors in Deep Pan Pizzas! Some are good for spicy lovers… some are good for meat lovers. Same with Deep Neural Networks. Many good researchers have found interesting ways of connecting the hidden layers (or baking the yummy middle) of DNNs. Some of them are very good in image interpretation while others are good in predicting values that involves time or the state. Convolutional Neural Networks, Recurrent Neural Networks are most famous flavors of this deep pan pizzas!

These deep pan pizzas have proven that they are able to perform some tasks with close-to-human accuracy and even sometimes with a higher accuracy than humans!deep-learning

Don’t panic! Robots would not invade the world soon…

 

Image Courtesy : DataScienceCentral | Wikipedia

Deep Learning Vs. Traditional Computer Vision

1_K68boX7fmtsYmyG2LlcmhQ

Traditional way of feature extraction

Computer vision can be succinctly described as finding and telling features from images to help discriminate objects and/or classes of objects.

Computer vision has become one of the vital research areas and the commercial applications bounded with the use of computer vision methodologies is becoming a huge portion in industry. The accuracy and the speed of processing and identifying images captured from cameras are has developed through decades. Being the well-known boy in town, deep learning is playing a major role as a computer vision tool.

Is deep learning the only tool to perform computer vision?

A big no! Deep learning came to the scene of computer vision couple of years back. Back then, computer vision was mainly based with image processing algorithms and methods. The main process of computer vision was extracting the features of the image. Detecting the color, edges, corners and objects were the first step to do when performing a computer vision task. These features are human engineered and accuracy and the reliability of the models directly depend on the extracted features and on the methods used for feature extraction. In the traditional vision scope, the algorithms like SIFT (Scale-Invariant Feature Transform), SURF (Speeded-Up Robust Features), BRIEF (Binary Robust Independent Elementary Features) plays the major role of extracting the features from the raw image.

The difficulty with this approach of feature extraction in image classification is that you have to choose which features to look for in each given image. When the number of classes of the classification goes high or the image clarity goes down it’s really hard to cope up with traditional computer vision algorithms.

The Deep Learning approach –

Deep learning, which is a subset of machine learning has shown a significant performance and accuracy gain in the field of computer vision. Arguably one of the most influential papers in applying deep learning to computer vision, in 2012, a neural network containing over 60 million parameters significantly beat previous state-of-the-art approaches to image recognition in a popular ImageNet computer vision competition: ISVRC-2012

Screen-Shot-2018-03-16-at-3.06.48-PMThe boom started with the convolutional neural networks and the modified architectures of ConvNets. By now it is said that some convNet architectures are so close to 100% accuracy of image classification challenges, sometimes beating the human eye!

The main difference in deep learning approach of computer vision is the concept of end-to-end learning. There’s no longer need of defining the features and do feature engineering. The neural do that for you. It can simply put in this way.

If you want to teach a [deep] neural network to recognize a cat, for instance, you don’t tell it to look for whiskers, ears, fur, and eyes. You simply show it thousands and thousands of photos of cats, and eventually it works things out. If it keeps misclassifying foxes as cats, you don’t rewrite the code. You just keep coaching it.

Wired | 2016

Though deep neural networks has its major drawbacks like, need of having huge amount of training data and need of large computation power, the field of computer vision has already conquered by this amazing tool already!

Keras; The API for Human Beings

Obviously deep learning is a hit! Being a subfield of machine learning, building deep neural networks for various predictive and learning tasks is one of the major practices all the AI enthusiasts do today. There are several deep learning frameworks out there that helps for building deep neural networks. TensorFlow, Theano, CNTK are some of the major frameworks used in the industry and in the research. These Frameworks has their own way of defining the tensor units and a way of configuring the connections between nodes. That is involves bit of a learning curve.

keras_1As shown in the graph, TensorFlow is the most popular and widely used deep learning framework right now. When it comes to Keras, it’s not working independently. It works as an upper layer for prevailing deep learning frameworks; namely with TensorFlow, Theano & CNTK (MXNet backend for Keras is on the way).  To be more précised, Keras act as a wrapper for these frameworks. Working with Keras is easy as working with Lego blocks. What you have to know is where to fix the right component. So it is the ultimate deep learning tool for human beings!

keras_2

Architecture of Keras API

Why Keras?

  • Fast prototyping – Most of the cases, you may have to test different neural architectures to find the best fit. Building the models from the beginning may time consuming. Keras will help you in this with modularizing your task and giving you the ability to reuse the code.
  • Supports CNN, RNN & combination of both –
  • Modularity
  • Easy extensibility
  • Simple to get started, simple to keep going
  • Deep enough to build serious models.
  • Well-written document. – Yes! Refer http://keras.io
  • Runs seamlessly on CPU and GPU. – Keras support GPU parallelization that will boost your execution.

Keras follows a very simple design idea. Here I’ve sum-up the main four steps of designing a Keras model deep learning model.

  1. Prepare your inputs and output tensors
  2. Create first layer to handle input tensor
  3. Create output layer to handle targets
  4. Build virtually any model you like in between

Basically, Keras models go through the following pipeline. You may have to re-visit the steps again and again to come up with the best model.

keras_3Let’s start with a simple experiment that involves classifying Dog & Cat images from Kaggle. First make sure to download the training & testing image files from Kaggle (https://www.kaggle.com/c/dogs-vs-cats/data)

Before playing with Keras, you may need to setup your rig. Please refer this post and make your beast ready for deep learning.

Then try this code! The code sections are commented for your reference. Here what I’m using is TensorFlow backend. You can change the configurations a bit and use Theano or CNTK as you wish.

# Convolutional Neural Network with Keras

# Installing Tensorflow
# pip install tensorflow-gpu

# Installing Keras
# pip install --upgrade keras

# Part 1 - Building the CNN

# Importing the Keras libraries and packages
import keras
from keras.layers import Conv2D
from keras.layers import MaxPooling2D
from keras.layers import Flatten
from keras.layers import Dense

# Initialising the CNN
classifier = Sequential()

# Step 1 - Convolution
#input_shape goes reverse if it is theano backend
#Images are 2D
classifier.add(Conv2D(32, (3, 3), input_shape = (64, 64, 3), activation = 'relu'))

# Step 2 - Pooling
#Most of the time it's (2,2) not loosing many. 
classifier.add(MaxPooling2D(pool_size = (2, 2)))

# Adding a second convolutional layer
#Inputs are the pooled feature maps of the previous layer
classifier.add(Conv2D(32, (3, 3), activation = 'relu'))
classifier.add(MaxPooling2D(pool_size = (2, 2)))

 

# Step 3 - Flattening
classifier.add(Flatten())

# Step 4 - Full connection
#relu - rectifier activation function
#128 nodes in the hidden layer
classifier.add(Dense(units = 128, activation = 'relu'))
#Sigmoid is used because this is a binary classification. For multiclass softmax
classifier.add(Dense(units = 1, activation = 'sigmoid'))

# Compiling the CNN
#adam is for stochastic gradient descent 
classifier.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ['accuracy'])

# Part 2 - Fitting the CNN to the images
#Preprocess the images to reduce overfitting
from keras.preprocessing.image import ImageDataGenerator

train_datagen = ImageDataGenerator(rescale = 1./255, #All the pixel values would be 0-1
shear_range = 0.2,
zoom_range = 0.2,
horizontal_flip = True)

test_datagen = ImageDataGenerator(rescale = 1./255)

training_set = train_datagen.flow_from_directory('dataset/training_set',
target_size = (64, 64),
batch_size = 32,
class_mode = 'binary')

test_set = test_datagen.flow_from_directory('dataset/test_set',
target_size = (64, 64),
batch_size = 32,
class_mode = 'binary')

classifier.fit_generator(training_set,
steps_per_epoch = 8000, #number of images in the training set
epochs = 5,
validation_data = test_set,
validation_steps = 2000)

#Prediction
import numpy as np
from keras.preprocessing import image
test_image = image.load_img('dataset/single_prediction/cat_or_dog_2.jpg', target_size=(64,64))
test_image = image.img_to_array(test_image)
test_image = np.expand_dims(test_image, axis=0)
result = classifier.predict(test_image)
training_set.class_indices
if result[0][0] == 1:
prediction = 'dog'
else:
prediction = 'cat'

print(prediction)

FAQ Bot in Minutes!

qna)Almost all the tech giants are massively investing on chatbots and make them available for the use of general public in an efficient and easy way. Many development tools, SDKs and services are now available in the market to build your own chatbots. Microsoft QnA Maker is one of the most handy tools to get started for building a basic Question & Answer Bot.

qna3Microsoft QnA Maker was in public preview for quite a while and it came for general availability with the Build 2018 announcements. If you have bots that already built using the QnA Maker preview portal, just go and migrate the knowledge bases that you’ve created to the new portal that has attached to QnA Maker management Portal. Here’s the guide to do that.

Building a bot using is pretty straight forward. What you need to have is a set of question and answer pairs that you need to add as the knowledge base of your chatbot. Tw knowlesge base can be created manually using the online editor or you can just upload a question & answer pairs in CSV/TSV formats, a word document or even a product manual. If you want to add set of FAQs in a website, what you have to do is provide the URL of that for to extract the information.

qna4

Testing the knowledge base realtime

The created knowledge can be tested using the portal Realtime. The corrections for the classifications also can be done through the portal. One of the major advantages of QnA Maker service is that the bot knowledge base can be directly deployed on client’s Azure Tennent without spoiling any privacy or compliance issues.

Publishing the knowledge base would create a REST endpoint that you can access through Microsoft Bot Framework and then directly publish into a desired channel. The sample code for building a simple QnA maker powered bot is available here on GitHub.

qna5One of the promising feature comes with the latest updates is the “Small Talk” request response dataset from Microsoft. This can make your bot seems more intelligent and human like. (Even Mmmm… s 😀 ) You can select your desired personality from Professional, Friendly or Humorous and download the dataset as a TSV. Then add that to your existing knowledge base. This will give your bot a more human like touch. (Make sure to select the datasets that is specifically built for QnA maker)

The pricing for the QnA maker service is just charging for the hosting service not for the number of transactions. (Note that you’d be charge for the bot service separately 😉 ) You can refer more about pricing here.  https://azure.microsoft.com/en-us/pricing/details/cognitive-services/qna-maker/

QnA maker is not the fully intelligent knowledge base building platform. But it can help you to come out with a fully functioning bot in minutes.

One-Hot Encoding in Practice

mtimFxhData is the king in machine learning. In the process of building machine learning models, data is used as the input features.

Input features comes in all shapes and sizes. For building a predictive model with a better accuracy rate, we should understand the data as well as the logic behind the algorithm we going to use to fit the model.

Data Understanding; as the second step of CRISP-DM, guides for understanding the types and the way the data we get has been represented. We can distinguish three main kinds of data feature.

  1. Quantitative Data           – Data with numerical scale (Age of a person in years, Price of a house in dollars etc.)
  2. Ordinal features              – Data without a scale but with ordering (Ordered sets/ first, second, third etc.)
  3. Categorical features       – Data without a numerical scale neither an ordering. These features don’t allow any statistical summary. (Car manufacturer categories, Civil status, N-grams in NLP etc.)

Most of the machine learning algorithms such as linear regression, logistic regression, neural network, support vector machine works better with numerical features.

Quantitative features come with a numerical value and they can be directly used (Sometimes data preprocessing, normalization may have to use) as the input features of ML algorithms.

Ordinal features can be easily represented in numbers (Ex. First = 1, Second = 2, Third = 3 …). This is called Integer Encoding. Representing ordinal features using numbers makes sense because the dependency between each representation can be notated in a numerical way.

There are some algorithms that can directly deal with joint discrete distribution such as Markov chain / Naive Bayes / Bayesian network, tree based, etc. These algorithms can work with categorical data without any encoding; while we should encode the categorical features in a way to represent in a numerically to use as the input features for other ML algorithms. That means it’s better to change the categorical features to numerical most of the times 😊

There are some special cases too. For an example, while naïve bias classification only really handles categorical features, many geometric models go in the other direction by only handling quantitative features.

How to convert Categorical data for Numerical data?

There are few ways to covert the categorical data to numerical data.

  • Dummy encoding
  • One-hot encoding / one-of-K scheme

are the most prominent ways of it.

One hot encoding is the process of converting the categorical features into numerical by performing “binarization” of the category and include it as a feature to train the model.

In mathematics, we can define one-hot encoding as…

One hot encoding transforms:

a single variable with n observations and d distinct values,

to

d binary variables with n observations each. Each observation indicating the presence (1) or absence (0) of the dth binary variable.

Let’s get this clear with an example. Suppose you have ‘flower’ feature which can take values ‘daffodil’, ‘lily’, and ‘rose’. One hot encoding converts ‘flower’ feature to three features, ‘is_daffodil’, ‘is_lily’, and ‘is_rose’ which all are binary.

CaptureA common application of OHE is in Natural Language Processing (NLP). It can be used to turn words to vectors so easily. Here comes a con of OHE, where the vector size might get very large with respect to the number of distinct values in the feature column.If there’s only two distinct categories in the feature, no need to construct to additional columns. You can just replace the feature column with one Boolean column.

oJEie

OHE in word vector representation

You can easily perform One-hot encoding in AzureML Studio by using the ‘Convert to Indicator Values’ module. The purpose of this module is to convert columns that contain categorical values into a series of binary indicator columns that can more easily be used as features in a machine learning model, which is the same happens in OHE. Let’s look at performing One-Hot encoding using python in next article.

Mission Plan for building a Predictive model

maxresdefaultWhen it comes to a machine learning or data science related problem, the most difficult part would be finding out the best approach to cope up with the task. Simply to get the idea of where to start!

Cross-industry standard process for data mining, commonly known by its acronym CRISP-DM, is a data mining process model describes commonly used approaches that data mining experts use to tackle problems. This process can be easily adopted for developing machine learning based predictive models as well.

CRISP-DM_Process_Diagram

CRISP – DM

No matter what are the tools/IDEs/languages you use for the process. You can adopt your tools according to the requirement you’ve.

Let’s walk through each step of the CRISP-DM model to see how it can be adopted for building machine learning models.

Business Understanding –

This is the step you may need the technical knowhow as well as a little bit of knowledge about the problem domain. You should have a clear idea on what you going to build and what would be the functional value of the prediction you suppose to do through the model. You can use Decision Model & Notation (https://en.wikipedia.org/wiki/Decision_Model_and_Notation) to describe the business need of the predictive model. Sometimes, the business need you are having might be able to solve using simple statistics other than going for a machine learning model.

Identifying the data sources is a task you should do in this step. Should check whether the data sources are reliable, legal and ethical to use in your application.

Data Understanding –

I would suggest you to do the following steps to get to know your data better.

  1. Data Definition – A detailed description on each data field in the data source. The notations of the data points, the units that the data points have been measured would be the cases you should consider about.
  2. Data Visualization – Hundreds or thousands of numerical data points may not give a clear idea for you what the data is about or an idea about the shape of your data. You may able to find interesting subsets of your data after visualizing it. It’s really easy to see the clustering patterns or the trending nature of the data in a visualized plot.
  3. Statistical analysis – Starting from the simple statistical calculations such as mean, median; you can calculate the correlation between each data field and it will help you to get a good idea about the data distribution. Feature engineering to increase the accuracy of the machine learning model. For performing that a descriptive statistical analysis would be a great asset.

For data understanding, The Interactive Data Exploration, Analysis and Reporting tool (IDEAR) can be used without getting the hassle of doing all the coding from the beginning. (Will discuss on IDEAR in a long run soon)

Data Preparation –

Data preparation would take roughly 80% of your time of the process implying it’s the most vital part in building predictive models.

This is the phase where you convert the raw data that you got from the data sources for the final datasets that you use for building the ML models. Most of the data you got from raw sources like IoT sensors or collectives are filled with outliers, contains missing values and disruptions. In the phase of data preparation, you should follow data preprocessing tasks to make those data fields usable in modeling.

Modeling –

Modeling is the part where algorithms comes to the scene. You can train and fit your data to a particular predictive model to perform the deserved prediction. You may need to check the math behind the algorithms sometimes to select the best algorithm that won’t overfit or underfit the model.

Different modeling methods may need data in different forms. So, you may need to revert back for the data preparation phase.

Evaluation –

Evaluation is a must before deploying a model. The objective of evaluating the model is to see whether the predictive model is meeting the business objectives that we’ve figured out in the beginning. The evaluation can be done with many parameter measures such as accuracy, AUC etc.

Evaluation may lead you to adjust the parameters of the model and might have to choose another algorithm that performs better. Don’t expect the machine learning model to be 100% accurate. If it is 100% most probably it would be an over fitted case.

Deployment –

Deployment of the machine learning model is the phase where the client, or the end user going to consume. In most of the cases, the predictive model would be a part of an intelligent application that acts as a service that gets a set of information and give a prediction as an output of that.

I would suggest you to deploy the module as a single component, so that it’s easy to scale as well as to maintain. APIs / Docker environments are some cool technologies that you can adopt for deploying machine learning models.

CRISP-DM won’t do all the magic of getting a perfect model as the output though it would definitely help you not to end up in a dead-end.