Keras; The API for Human Beings

Obviously deep learning is a hit! Being a subfield of machine learning, building deep neural networks for various predictive and learning tasks is one of the major practices all the AI enthusiasts do today. There are several deep learning frameworks out there that helps for building deep neural networks. TensorFlow, Theano, CNTK are some of the major frameworks used in the industry and in the research. These Frameworks has their own way of defining the tensor units and a way of configuring the connections between nodes. That is involves bit of a learning curve.

keras_1As shown in the graph, TensorFlow is the most popular and widely used deep learning framework right now. When it comes to Keras, it’s not working independently. It works as an upper layer for prevailing deep learning frameworks; namely with TensorFlow, Theano & CNTK (MXNet backend for Keras is on the way).  To be more précised, Keras act as a wrapper for these frameworks. Working with Keras is easy as working with Lego blocks. What you have to know is where to fix the right component. So it is the ultimate deep learning tool for human beings!

keras_2

Architecture of Keras API

Why Keras?

  • Fast prototyping – Most of the cases, you may have to test different neural architectures to find the best fit. Building the models from the beginning may time consuming. Keras will help you in this with modularizing your task and giving you the ability to reuse the code.
  • Supports CNN, RNN & combination of both –
  • Modularity
  • Easy extensibility
  • Simple to get started, simple to keep going
  • Deep enough to build serious models.
  • Well-written document. – Yes! Refer http://keras.io
  • Runs seamlessly on CPU and GPU. – Keras support GPU parallelization that will boost your execution.

Keras follows a very simple design idea. Here I’ve sum-up the main four steps of designing a Keras model deep learning model.

  1. Prepare your inputs and output tensors
  2. Create first layer to handle input tensor
  3. Create output layer to handle targets
  4. Build virtually any model you like in between

Basically, Keras models go through the following pipeline. You may have to re-visit the steps again and again to come up with the best model.

keras_3Let’s start with a simple experiment that involves classifying Dog & Cat images from Kaggle. First make sure to download the training & testing image files from Kaggle (https://www.kaggle.com/c/dogs-vs-cats/data)

Before playing with Keras, you may need to setup your rig. Please refer this post and make your beast ready for deep learning.

Then try this code! The code sections are commented for your reference. Here what I’m using is TensorFlow backend. You can change the configurations a bit and use Theano or CNTK as you wish.

# Convolutional Neural Network with Keras

# Installing Tensorflow
# pip install tensorflow-gpu

# Installing Keras
# pip install --upgrade keras

# Part 1 - Building the CNN

# Importing the Keras libraries and packages
import keras
from keras.layers import Conv2D
from keras.layers import MaxPooling2D
from keras.layers import Flatten
from keras.layers import Dense

# Initialising the CNN
classifier = Sequential()

# Step 1 - Convolution
#input_shape goes reverse if it is theano backend
#Images are 2D
classifier.add(Conv2D(32, (3, 3), input_shape = (64, 64, 3), activation = 'relu'))

# Step 2 - Pooling
#Most of the time it's (2,2) not loosing many. 
classifier.add(MaxPooling2D(pool_size = (2, 2)))

# Adding a second convolutional layer
#Inputs are the pooled feature maps of the previous layer
classifier.add(Conv2D(32, (3, 3), activation = 'relu'))
classifier.add(MaxPooling2D(pool_size = (2, 2)))

 

# Step 3 - Flattening
classifier.add(Flatten())

# Step 4 - Full connection
#relu - rectifier activation function
#128 nodes in the hidden layer
classifier.add(Dense(units = 128, activation = 'relu'))
#Sigmoid is used because this is a binary classification. For multiclass softmax
classifier.add(Dense(units = 1, activation = 'sigmoid'))

# Compiling the CNN
#adam is for stochastic gradient descent 
classifier.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ['accuracy'])

# Part 2 - Fitting the CNN to the images
#Preprocess the images to reduce overfitting
from keras.preprocessing.image import ImageDataGenerator

train_datagen = ImageDataGenerator(rescale = 1./255, #All the pixel values would be 0-1
shear_range = 0.2,
zoom_range = 0.2,
horizontal_flip = True)

test_datagen = ImageDataGenerator(rescale = 1./255)

training_set = train_datagen.flow_from_directory('dataset/training_set',
target_size = (64, 64),
batch_size = 32,
class_mode = 'binary')

test_set = test_datagen.flow_from_directory('dataset/test_set',
target_size = (64, 64),
batch_size = 32,
class_mode = 'binary')

classifier.fit_generator(training_set,
steps_per_epoch = 8000, #number of images in the training set
epochs = 5,
validation_data = test_set,
validation_steps = 2000)

#Prediction
import numpy as np
from keras.preprocessing import image
test_image = image.load_img('dataset/single_prediction/cat_or_dog_2.jpg', target_size=(64,64))
test_image = image.img_to_array(test_image)
test_image = np.expand_dims(test_image, axis=0)
result = classifier.predict(test_image)
training_set.class_indices
if result[0][0] == 1:
prediction = 'dog'
else:
prediction = 'cat'

print(prediction)
Advertisements

FAQ Bot in Minutes!

qna)Almost all the tech giants are massively investing on chatbots and make them available for the use of general public in an efficient and easy way. Many development tools, SDKs and services are now available in the market to build your own chatbots. Microsoft QnA Maker is one of the most handy tools to get started for building a basic Question & Answer Bot.

qna3Microsoft QnA Maker was in public preview for quite a while and it came for general availability with the Build 2018 announcements. If you have bots that already built using the QnA Maker preview portal, just go and migrate the knowledge bases that you’ve created to the new portal that has attached to QnA Maker management Portal. Here’s the guide to do that.

Building a bot using is pretty straight forward. What you need to have is a set of question and answer pairs that you need to add as the knowledge base of your chatbot. Tw knowlesge base can be created manually using the online editor or you can just upload a question & answer pairs in CSV/TSV formats, a word document or even a product manual. If you want to add set of FAQs in a website, what you have to do is provide the URL of that for to extract the information.

qna4

Testing the knowledge base realtime

The created knowledge can be tested using the portal Realtime. The corrections for the classifications also can be done through the portal. One of the major advantages of QnA Maker service is that the bot knowledge base can be directly deployed on client’s Azure Tennent without spoiling any privacy or compliance issues.

Publishing the knowledge base would create a REST endpoint that you can access through Microsoft Bot Framework and then directly publish into a desired channel. The sample code for building a simple QnA maker powered bot is available here on GitHub.

qna5One of the promising feature comes with the latest updates is the “Small Talk” request response dataset from Microsoft. This can make your bot seems more intelligent and human like. (Even Mmmm… s 😀 ) You can select your desired personality from Professional, Friendly or Humorous and download the dataset as a TSV. Then add that to your existing knowledge base. This will give your bot a more human like touch. (Make sure to select the datasets that is specifically built for QnA maker)

The pricing for the QnA maker service is just charging for the hosting service not for the number of transactions. (Note that you’d be charge for the bot service separately 😉 ) You can refer more about pricing here.  https://azure.microsoft.com/en-us/pricing/details/cognitive-services/qna-maker/

QnA maker is not the fully intelligent knowledge base building platform. But it can help you to come out with a fully functioning bot in minutes.

One-Hot Encoding in Practice

mtimFxhData is the king in machine learning. In the process of building machine learning models, data is used as the input features.

Input features comes in all shapes and sizes. For building a predictive model with a better accuracy rate, we should understand the data as well as the logic behind the algorithm we going to use to fit the model.

Data Understanding; as the second step of CRISP-DM, guides for understanding the types and the way the data we get has been represented. We can distinguish three main kinds of data feature.

  1. Quantitative Data           – Data with numerical scale (Age of a person in years, Price of a house in dollars etc.)
  2. Ordinal features              – Data without a scale but with ordering (Ordered sets/ first, second, third etc.)
  3. Categorical features       – Data without a numerical scale neither an ordering. These features don’t allow any statistical summary. (Car manufacturer categories, Civil status, N-grams in NLP etc.)

Most of the machine learning algorithms such as linear regression, logistic regression, neural network, support vector machine works better with numerical features.

Quantitative features come with a numerical value and they can be directly used (Sometimes data preprocessing, normalization may have to use) as the input features of ML algorithms.

Ordinal features can be easily represented in numbers (Ex. First = 1, Second = 2, Third = 3 …). This is called Integer Encoding. Representing ordinal features using numbers makes sense because the dependency between each representation can be notated in a numerical way.

There are some algorithms that can directly deal with joint discrete distribution such as Markov chain / Naive Bayes / Bayesian network, tree based, etc. These algorithms can work with categorical data without any encoding; while we should encode the categorical features in a way to represent in a numerically to use as the input features for other ML algorithms. That means it’s better to change the categorical features to numerical most of the times 😊

There are some special cases too. For an example, while naïve bias classification only really handles categorical features, many geometric models go in the other direction by only handling quantitative features.

How to convert Categorical data for Numerical data?

There are few ways to covert the categorical data to numerical data.

  • Dummy encoding
  • One-hot encoding / one-of-K scheme

are the most prominent ways of it.

One hot encoding is the process of converting the categorical features into numerical by performing “binarization” of the category and include it as a feature to train the model.

In mathematics, we can define one-hot encoding as…

One hot encoding transforms:

a single variable with n observations and d distinct values,

to

d binary variables with n observations each. Each observation indicating the presence (1) or absence (0) of the dth binary variable.

Let’s get this clear with an example. Suppose you have ‘flower’ feature which can take values ‘daffodil’, ‘lily’, and ‘rose’. One hot encoding converts ‘flower’ feature to three features, ‘is_daffodil’, ‘is_lily’, and ‘is_rose’ which all are binary.

CaptureA common application of OHE is in Natural Language Processing (NLP). It can be used to turn words to vectors so easily. Here comes a con of OHE, where the vector size might get very large with respect to the number of distinct values in the feature column.If there’s only two distinct categories in the feature, no need to construct to additional columns. You can just replace the feature column with one Boolean column.

oJEie

OHE in word vector representation

You can easily perform One-hot encoding in AzureML Studio by using the ‘Convert to Indicator Values’ module. The purpose of this module is to convert columns that contain categorical values into a series of binary indicator columns that can more easily be used as features in a machine learning model, which is the same happens in OHE. Let’s look at performing One-Hot encoding using python in next article.

Mission Plan for building a Predictive model

maxresdefaultWhen it comes to a machine learning or data science related problem, the most difficult part would be finding out the best approach to cope up with the task. Simply to get the idea of where to start!

Cross-industry standard process for data mining, commonly known by its acronym CRISP-DM, is a data mining process model describes commonly used approaches that data mining experts use to tackle problems. This process can be easily adopted for developing machine learning based predictive models as well.

CRISP-DM_Process_Diagram

CRISP – DM

No matter what are the tools/IDEs/languages you use for the process. You can adopt your tools according to the requirement you’ve.

Let’s walk through each step of the CRISP-DM model to see how it can be adopted for building machine learning models.

Business Understanding –

This is the step you may need the technical knowhow as well as a little bit of knowledge about the problem domain. You should have a clear idea on what you going to build and what would be the functional value of the prediction you suppose to do through the model. You can use Decision Model & Notation (https://en.wikipedia.org/wiki/Decision_Model_and_Notation) to describe the business need of the predictive model. Sometimes, the business need you are having might be able to solve using simple statistics other than going for a machine learning model.

Identifying the data sources is a task you should do in this step. Should check whether the data sources are reliable, legal and ethical to use in your application.

Data Understanding –

I would suggest you to do the following steps to get to know your data better.

  1. Data Definition – A detailed description on each data field in the data source. The notations of the data points, the units that the data points have been measured would be the cases you should consider about.
  2. Data Visualization – Hundreds or thousands of numerical data points may not give a clear idea for you what the data is about or an idea about the shape of your data. You may able to find interesting subsets of your data after visualizing it. It’s really easy to see the clustering patterns or the trending nature of the data in a visualized plot.
  3. Statistical analysis – Starting from the simple statistical calculations such as mean, median; you can calculate the correlation between each data field and it will help you to get a good idea about the data distribution. Feature engineering to increase the accuracy of the machine learning model. For performing that a descriptive statistical analysis would be a great asset.

For data understanding, The Interactive Data Exploration, Analysis and Reporting tool (IDEAR) can be used without getting the hassle of doing all the coding from the beginning. (Will discuss on IDEAR in a long run soon)

Data Preparation –

Data preparation would take roughly 80% of your time of the process implying it’s the most vital part in building predictive models.

This is the phase where you convert the raw data that you got from the data sources for the final datasets that you use for building the ML models. Most of the data you got from raw sources like IoT sensors or collectives are filled with outliers, contains missing values and disruptions. In the phase of data preparation, you should follow data preprocessing tasks to make those data fields usable in modeling.

Modeling –

Modeling is the part where algorithms comes to the scene. You can train and fit your data to a particular predictive model to perform the deserved prediction. You may need to check the math behind the algorithms sometimes to select the best algorithm that won’t overfit or underfit the model.

Different modeling methods may need data in different forms. So, you may need to revert back for the data preparation phase.

Evaluation –

Evaluation is a must before deploying a model. The objective of evaluating the model is to see whether the predictive model is meeting the business objectives that we’ve figured out in the beginning. The evaluation can be done with many parameter measures such as accuracy, AUC etc.

Evaluation may lead you to adjust the parameters of the model and might have to choose another algorithm that performs better. Don’t expect the machine learning model to be 100% accurate. If it is 100% most probably it would be an over fitted case.

Deployment –

Deployment of the machine learning model is the phase where the client, or the end user going to consume. In most of the cases, the predictive model would be a part of an intelligent application that acts as a service that gets a set of information and give a prediction as an output of that.

I would suggest you to deploy the module as a single component, so that it’s easy to scale as well as to maintain. APIs / Docker environments are some cool technologies that you can adopt for deploying machine learning models.

CRISP-DM won’t do all the magic of getting a perfect model as the output though it would definitely help you not to end up in a dead-end.

Tips & Tricks for building a better LUIS model – 2

Capture2017 ended up making ‘chatbots’ not a trend but an essential in the tech world. With the rise of chatbots, building up effective natural language understanding (NLU) models is a must. LUIS is an admirable service from Microsoft Cognitive Services that can use for building NLU models.

Most of the cases LUIS models perform well but in some cases LUIS models pop out with unexpected results. This may cause with various reasons including the less understanding on the business domain of the chatbot.

Here I’m adding few more for the points we discussed in the previous article, that we need to consider when building accurate LUIS models.

Note that LUIS has got a major update as well as a UI change compared to the version we used back last year.

Using Phrase List

You can create a phrase list for teaching synonyms for the model. For an example here, I’ve added a phrase list for the word “good”. The wizard recommends a set of words that can added to the phrase list and you can select the appropriate words from those suggestions. No need of typing all the synonyms. Let LUIS handle it. 😊

3_phrase listAnother use of the Phrase list is to train the model the domain specific words. As an example, you can add a list of fruit names (Apple, Orange, Grapes) to a phrase list and give it a name. It’ll help the LUIS model to align according to the given domain.

Use same amount of utterances to train each intent

When you training the intents using possible utterances, make sure to use roughly the same number of utterances for each intent. Else the intent predicting process might be bias to intents with higher number of utterances.

Comparing the accuracy of the model with published versions

With the new LUIS portal, you can test the accuracy of the built model without connecting it to a bot service. Adding to that you can compare the difference of two versions of the same LUIS model at the same time. After observing the difference for various utterances, you can decide which model version should go for production.

5_test

LUIS Programmatic API

This is not directly attached with optimizing the accuracy. LUIS Programmatic API allows you to do all most all the tasks that you perform to build and train a LUIS model through API calls. This may comes handy when you building a bot that can learn by itself.

Add Bing spell checker to the chatbot

You may notice that the spelling mistakes of the user cause wrong intent identifications. To overcome this issue, you can enable bing spell checker in your LUIS model. This may cost you a bit, but the accuracy of identifying intents would go up.

1_spellcheck

If you have more tips & tricks to share for optimizing NLU models, do share here as comments. 😊

Deploy Machine Learning Models in a Production environment as APIs (Python Flask + Visual Studio)

Intelligent application building basically consist of integrating machine learning based predictive components for the apps and systems. Mostly data scientists or the AI engineers are accountable of building these machine learning models.

When it comes to integration and deployment in production environment, the problem occurs with platform dependency. Most of the data scientists and AI engineers are pretty comfortable with python or R and they develop their models with them, though the rest of the system would be on .NET or Java based application.

One of the best approaches to connect these components together is deploying the ML predictive module as a web API and calling the API through the application. When it comes to APIs any programmer can work with it when they have the API definition.

Flask is a small and powerful web framework for Python. It’s easy to learn and simple to use, enabling you to build your web app in a short amount of time. Visual Studio provides an easy way to create Python flask web applications through it’s templates. Here’s the steps I’ve gone through for deploying the ML experiment as a REST API.

01. Create the machine learning model, train, tune and evaluate it.

Here what I’ve done is a simple linear regression for predicting the monthly salary according to the years of experience. Sci-kit learn python library has been used for performing the regression operation. The dataset used for the experiment is from SuperDataScience. 

The code is available in the GitHub repository .

02. Creating the pickle

When you deploy the predictive model in production environment, no need of training the model with code again and again. Python has a built-in method of persisting data called pickle. The pickle module can serialize objects or data into a file that we can save and load from. You can just use the pickle as a binary reference generating the output.  scikit-learn has their own model persistence method we will use: joblib. This is more efficient to use with scikit-learn models due to it being better at handling larger numpy arrays that may be stored in the models.

03. Create a Python Flask web application.

Simply go for Visual Studio. (I’m using VS2017 which comes with python by default) Select web project. The step by step guide is here.  I would recommend you to go with option 2 mentioned in the blog because it reduces lot of unnecessary overhead.

f_2For the safe side, use python virtual environments. It would avoid many hassles occurs with library dependencies. I’ve used anaconda environment as the base of virtual environment.

f_3

04. Create the API.

Create a new python file in your project and set it as the startup file. (In my case MLService.py is the startup file which contains the API code). The pickle file that contains the model binaries is the only dependency the API is getting when it is deployed.

f_7Here the API operates through POST methods which accepts the input in JSON.

04. Run & Test

You can run the API and test by sending POST requests to the URL with a JSON body. Here I’ve used postman to send a POST request and it gives me the predicted salary for the entered number of months.

f_5

You can access the whole code of the project through my GitHub repo here.

f_6

    Do comment if you have any suggestion to change the API structure.

Handling Big snakes on Visual Studio

In the last post we discuss on setting up a Windows rig for deep learning. If you still haven’t setup your machine, go do it first: D

After getting the so called big snakes; python and anaconda in the machine, we should have a proper IDE for coding.

There are many good IDEs you can use in Windows environment to code in python. Pycharm, Spyder are some popular tools.

If you familiar with Visual Studio, the so-called father of all IDEs, python works smoothly with VS. There are few configurations need to be done.

c1No need to purchase Visual Studio enterprise or ultimate. The freely available Visual Studio Community edition works fine. In 2017 version python comes along side with the default installation options. For the later versions you need to install Python Tools for Visual Studio (PTVS) separately.

https://docs.microsoft.com/en-us/visualstudio/python/python-in-visual-studio

Refer this guide for more details.

The python environments configured to machines can be seen from ‘Python Environments’ pane of Visual Studio. (If it is not there go for Tools -> Python -> Python Environments)

c2

By default, your Anaconda environment and default python environment should be there. First Refresh those environments to support intelliSense and grab the installed libraries for the DB.

For our deep learning experimentations, we configured a separate python environment before. To add that environment for visual studio follow the following steps.

01. Click Custom on ‘Python environments’

02. Go for anaconda environments and activate your pre-configured environment for deep learning (Mine is tensorflow-gpu)

c4

03. Copy the interpreter path of the environment

04. Insert it for the interpreter path and click “auto detect’. Visual Studio will detect the rest

c3

05. Click Apply

It may take few minutes to refresh the packages as well as the intelliSense. Make the configured environment your default and open the interactive. You are good to go 😊