Image Classification with CustomVision.ai

cv1Extracting the teeny tiny features in images, feeding the features into deep neural networks with number of hidden neuron layers and granting the silicon chips “eyes” to see has become a hot topic today. Computer vision has gone so far from the era of pattern recognition and feature engineering. With the advancement of machine learning algorithms combined with deep learning; understanding the content in the images and using them in real world applications has become a MUST more than a trend.

Recently during the Microsoft Build2017 conference, they announced a handy tool for training a machine learning image classification model to tag or label your own images. Most interesting part of this tool is, it provides an easy to use user interface to upload your own images for training the model.

After training and tuning the model you can use it as a web service. Using the REST API you just have to push the request to the web service and it’ll do the magic for you.

I just did a tiny experiment with this tool by building an image classifier that classifies few famous landmarks.

I’ve the following image set

  • Eiffel tower – 6 images
  • Great wall – 11 images
  • KL tower – 7 images
  • Stonehenge – 7 images
  • Space Needle – 7 images
  • Taj Mahal – 7 images
  • Sigiriya – 8 images

Let’s get started!

Go to customvision.ai – just sign in with your mail id and you’ll land onto the “My Projects” page

cv2Fill the name, description and select the domain you going to build the model. Here I’ve selected Landmarks because the images I’m going to use contains landmarks and structural buildings.

I had the images of each landmark in separate folders in my local machine. I uploaded the images category by category.  System will detect if you upload duplicate images.

cv4All together 53 images with different tags were uploaded for training.

Training will get few minutes. Optimize the probability threshold to get the best precision and recall. Then get the prediction URL. What you have to do is simply forward a JSON input for the Prediction API.cv7

You can retrain the model by tagging the images used for testing. In a production environment, you can use the user inputs to make the perdition model more accurate. The retrained model will appear as a different iteration. You have the freedom to choose the best iteration that should go live with the API.

You can quickly test how well the model you built us performing. Note that any ML model isn’t giving you 100% accuracy.

cv6

cv9

A prediction from the API

If you prefer to do this in a programmatic way, or your application need to do all the training and calling in the backend, just use Custom vision SDK.

https://github.com/Microsoft/azure-docs/blob/master/articles/cognitive-services/Custom-Vision-Service/csharp-tutorial.md

The SDK comes pretty handy with training new models and adding labels for the images and training it before publishing the prediction API.

Grab a set of images. Build a classifier or a tagger. Make your clients WOW! 😃

Advertisements

Democratizing Machine Learning with Cloud

HiRes.jpg.800x600_q96We have already passed the era of gigabytes when it comes to data. World is talking about terabytes of unstructured data and massive amounts of data points generated from IoT devices and sensors in millions per a second. To analyze these heaps of data, obviously, we need large computation power and massive storage. Building workhorse machines to fulfil those tremendous workloads would definitely cost a lot. Cloud computing paradigm comes handy here. The resourcefulness and the scalability of the public cloud can be used to perform the large calculations in machine learning algorithms.

Almost all the major public cloud providers in the market comes up with machine learning services. Cloud machine learning services in Google Cloud Platform provides modern machine learning services, with pre-trained models and a service to generate your own tailored models. Amazon Machine Learning is a service that makes it easy for developers of all skill levels to use machine learning technology. IBM analytics comes up with a machine learning platform with its cloud data services. Azure Machine Learning Studio is a GUI-based integrated development environment for constructing and operationalizing Machine Learning workflow on Azure. We discussed a lot about Azure Machine Learning and its appliances in practical scenarios in the previous posts.

All the mentioned platforms provide machine learning as a service. Most of the platforms offer pre-built ML algorithms in packages. Simple drag and drop user interactions and easy deployment has attracted many developers to use these tools.

But, how would it be if you want to go from the scratch? Either you want to use the power of Graphical Processing Units (GPUs) to process the ML algorithms parallelly? Cloud based Virtual Machines specifically optimized for computation is one of the best solutions that you can consume.

Azure Data Science Virtual Machine (DSVM) –

dsvm

DSVM in Azure Portal

If you already have used Azure virtual machines for your computation, hosting or storage tasks, this would not be a new concept for you. Azure DSVM is specifically optimized for large computations. Azure DSVM comes in two flavors. One with Windows and the other with Linux. You can choose the hardware configurations as you wish. Many development environments, programming IDEs, languages are pre-installed in the VM instances.

dsvm_linuxMy personal favorite here is the Linux DSVM instance. Here I’ve created a Linux DSVM with the basic configurations. For accessing the VM you can use any tool that can do a SSH call. What I normally do is calling the accessing the VM using Ubuntu Bash on Windows 10.

GPUs for machine learning –

GPU_1

GPU_2

Configurations of the Linux VM with Nvidia GPU

Many machine learning algorithms currently available can be executed parallely. Execution parts of those algorithms are embarrassingly parallel. With that parallel programming, you can reduce the execution time of the algorithms drastically. Data scientists in both industry and academia have been using GPUs for machine learning to make groundbreaking improvements across a variety of applications including image classification, video analytics, speech recognition and natural language processing.

google_brain

GPUs Vs. CPU computing

Specially in Deep Learning, parallel processing using GPUs can make a drastic decrease in computation time. Purchasing a deep learning dream machine powered with a CUDA enabled high-end GPU such as Nvidia Tesla K80 would cost nearly 6000 dollars! Rather than spending a lot on a machine like that, the most feasible plan is to provision a virtual machine with the specifications we need and pay as we consume.

VM_size

VM instance price plans

The N-series is a family of Azure Virtual Machines with GPU capabilities that you can use for these kinds of tasks. The N-series will feature the NVIDIA Tesla accelerated platform as well as NVIDIA GRID 2.0 technology, providing the highest-end graphics support available in the cloud today. Through your Azure portal, you can choose a desired price plan with the desired configurations for your tasks when provisioning the VM.

teslaHere’s my Azure VM specifically configured for deep learning exercises. The machine is powered with Tesla K80 GPU which is having 4992 cores in it!! I installed anaconda for that and doing computations using Jupyter notebooks.

Just a hint: stop your VM instance when you are not using it for computation to avoid getting huge unnecessary bills. 😉

No need of huge wallets! The wise decision would be applying cloud technologies for machine learning.

Simple Linear Regression with Azure ML + Python

1419973816879Simple linear regression is a statistical method that allows us to summarize and study relationships between two continuous (quantitative) variables: One variable, denoted x, is regarded as the predictor, explanatory, or independent variable. The other variable, denoted y, is regarded as the response, outcome, or dependent variable.

Typically when we doing regression analysis, we consider about the correlation of coefficient of the input variables. Correlation analysis measures the extent to which two variables vary together, including the strength and direction of their relationship.

correlation_dot_graphsLinear correlation coefficient(also called Pearson product-moment correlation coefficient) measure of the strength and direction of a linear association between two random variables.

I used the Istanbul Stock Exchange dataset to demonstrate the steps in doing a simple linear regression prediction. Azure Machine Learning experiment has built (get the experiment from here) for building the regression model. Built-in Bayesian Linear Regression algorithm has been used for building the model.

capture1The most interesting part is coming with python! 🙂

I’ve used a Jupyter Notebook and fetched the data to that workspace to visualize the dataset and to calculate the coefficient values between each variable. Pearsonr method in scipy library has used for that.

Refer the iPython notebook from Azure Notebook for the complete python script and the visualizations.

https://notebooks.azure.com/library/Python%20Visualizations/html/Istanbul%20Stock%20Python%203%20notebook.ipynb

Do run the code by your own. You’ll get it for sure!

 

Natural Language Processing with Python + Visual Studio

cap_4Human Language is one of the most complicated phenomena to interpret for machines. Comparing to artificial languages like programming languages and mathematical notations, natural languages are hard to notate with explicit rules. Natural Language Processing, AKA Computational Linguistics enable computers to derive meaning from human or natural language input.

When it comes to natural language processing, text analysis plays a major role. One of the major problems we have to face when processing natural language is the computation power. Working with big corpus and chunking the textual data into n-grams need a big processing power. All mighty cloud; the ultimate savior comes handy in this application too.

Let’s peep into some of the cool tools you can use in your developments. In most of the cases, you don’t want to get the hassle of developing from the scratch. There are plenty of APIs and libraries that you can directly integrate with your system.

If you think, you wanna go from scratch and do some enhancements, there’s the space for you too. 😊

Text Analytics APIs

Microsoft text analytics APIs are set of web services built with Azure Machine Learning. Many major tasks found in natural language processing are exposed as web services through this. The API can be used to analyze unstructured text for tasks such as sentiment analysis, key phrase extraction, language detection and topic detection. No hard rules are training loads. Just call the API from your C# or python code. Refer the link below for more info.

https://docs.microsoft.com/en-us/azure/machine-learning/machine-learning-apps-text-analytics

Process natural language from the scratch!

Python! Yeah. that’s it! Among many languages used for programming, python comes handy with many pre-built packages specifically built for natural language processing.

Obviously, python works well with unix systems. But now the best IDE in town; Visual Studio comes with a toolset for python which enable you to edit, debug and compile python scripts using your existing IDE.  You should have Visual Studio 2015 (Community edition, professional or enterprise) for installing the python tools. (https://www.visualstudio.com/vs/python/)

Here I’ve used NLTK (Natural Language Tool Kit) for the task. One of the main advantage with NLTK is, it comes with dozens of built in corpora and trained models.

These are the Language processing tasks and corresponding NLTK modules with examples of functionality comes with that.

cap_1

Source – http://www.nltk.org/book/ch00.html

For running python NLTK for the first time you may need to download the nltk_data. Go for the python interactive console and install the required data from the popping up NLTK downloader. (Use nltk.download()  for this task)

cap_2

Here’s a little simulation of natural language processing tasks done using NLTK. Code snippets are commented for easy reading. 😊

import nltk
from nltk.corpus import treebank
from nltk.corpus import stopwords

#Sample sentence used for processing
sentence = """John came to office at eight o'clock on Monday morning & left the office with Arthur bit early."""

#Tokenizing the sentence into words 
word_tokens = nltk.word_tokenize(sentence)
#Tagging words
tagged_words = nltk.pos_tag(word_tokens)
#Identify named entities
named_entities = nltk.chunk.ne_chunk(tagged_words)

#Removing the stopwords from the text - Predefined stopwords in English have been used.
stop_words = set(stopwords.words('english'))
filtered_sentence = [w for w in word_tokens if not w in stop_words]

filtered_sentence = []

for w in word_tokens:
    if w not in stop_words:
        filtered_sentence.append(w)

print('Sentence - ' + sentence)
print('Word tokens - ')
print(word_tokens)
print('Tagged words - ')
print(tagged_words)
print('Named entities - ')
print(named_entities)
print('Word tokens - ')
print(word_tokens)
print('Filtered sentence - ')
print(filtered_sentence)

 

The output after executing the script should be like this.

cap_3You can Improve these basics to build Named Entity Recognizer and many more…

Try processing the language you read and speak… 😉

Jupyter Notebook on AzureML

plot_regression_3d_1 If you are fond of playing with data to dig out the relationships of it and to plot interesting visualizations with data; python is the language you should speak.

Over the years, with the strong community support, python language got dedicated libraries for data analysis and predictive modeling like scikit-learn, Tensorflow, Theano etc. Even the ultimate IDE in town; Visual Studio started supporting python! So, no hesitation. Python is a great choice to make.

You can use many IDEs or even a simple text editor to write your python files. But python comes with a handy web application; Jupyter notebook that can be used to do your code. Even compile it!

Jupyter gets its birth in 2014 as a spin-off project of IPython; which is a command shell for interactive computing in multiple programming languages, originally developed for the Python.

Why Jupyter?

Jupyter notebook is a very popular tool among data scientists which as a web application that allows you to create and share documents that contain live code, equations, visualizations and explanatory text. “Jupyter” is a loose acronym meaning Julia, Python and R. One of the most prominent uses you get when using Jupyter notebook is the ability of sharing the data transformation and visualization steps with your peers.

If you want to run Jupyter notebook in your local machine do refer the link below. With a few easy steps, you can have Jupyter notebook up and running in your machine.

http://jupyter.readthedocs.io/en/latest/install.html

One of the easiest ways to use Jupyter is running the notebook on Azure. No need to have python or the dependencies of it installed on your local machine. You can create, edit and share the Jupyter notes using Azure Machine Learning Studio. All the execution happens on the cloud.

Let’s get started!

1Access your notebook from “Notebooks” tab of AzureML Studio. When creating a new notebook, you can select which language and version you want to have in your notebook. Python 2, Python 3 and R are the supported languages right now.

Same as the Jupyter notebook running on the local machine, you get the same IPython interface on your browser.

2On the notebook menu bar, you can find out the ‘help’ menu which contains a brief user interface tour as well as a list of keyboard shortcuts that you can use to drive the notebook.

Here’s a little data mashup I’ve done using the famous ‘Iris dataset’ included in python sklearn. The .ipynb file is available on my github repo. Feel free to download and play with. A static html page created with the notebook output also included in the repo.

Azure is coming up with Azure Notebook preview feature. Here’s Iris visualization hosted on Azure Notebook

https://notebooks.azure.com/library/Python%20Visualizations/html/Iris+Data+Visualization.ipynb

No Machine learning algorithms or complex code snippets here. Just a data visualization & data transformation. 🙂

 

 

 

Time Series Forecasting with Azure ML

airline1_web-0When we have a series of data points indexed in time order we can define that as a “Time Series”. Most commonly, a time series is a sequence taken at successive equally spaced points in time. Monthly rainfall data, temperature data of a certain place are some examples for time series.

In the field of predictive analytics, there are many incidents that need to analyze time series data and forecast the future values of that based on the previous values. Think of a scenario where you’ve to do a time series prediction for your business data or an incident where part of your predictive experiment contains a time series field that need to predict the future data points… There are many algorithms and machine learning models that you can use for forecasting time series values.

Multi-layer perception, Bayesian neural networks, radial basis functions, generalized regression neural networks (also called kernel regression), K-nearest neighbor regression, CART regression trees, support vector regression, and Gaussian processes are some machine learning algorithms that can be used for time series forecasting.

See here for more about these methods

Autoregressive Moving Average (ARIMA), Seasonal-ARIMA, Exponential smoothing (ETS) are some algorithms that widely used for this kind of time series analysis. I’m not going to dig deep into the algorithms, trend analysis and all numbers & characteristics bound with time series. Just going to demonstrate a simple way that you can do time series analysis in your deployments using Azure ML Studio.

After adding a dataset that contains a time series data into AzureML Studio, you can perform the time series analysis and predictions by using python or R scripts. In addition to that ML Studio offers a pre-built module for Anomaly detection of time series datasets. It can learn the normal characteristics of the provided time series and detect deviations from the normal pattern.

Here I’ve used forecast R package to write code snippets enabling AzureML Studio to do TS forecasting using popular time series algorithms namely as ARIMA, Seasonal ARIMA and ETS.

ARIMA seasonal & ARIMA non-seasonal

#ARIMA Seasonal / ARIMA non-seasonal 
library(forecast)
# Map 1-based optional input ports to variables
dataset1 <- maml.mapInputPort(1) # class: data.frame
dataset2 <- maml.mapInputPort(2) # class: data.frame

#Enter the seasonality of the timeseries here
#For non-seasonal model use '1' as the seasonality
seasonality<-12
labels <- as.numeric(dataset1$data)
timeseries <- ts(labels,frequency=seasonality)
model <- auto.arima(timeseries)
numPeriodsToForecast <- ceiling(max(dataset2$date)) - ceiling(max(dataset1$date))
numPeriodsToForecast <- max(numPeriodsToForecast, 0)
forecastedData <- forecast(model, h=numPeriodsToForecast)
forecastedData <- as.numeric(forecastedData$mean)

output <- data.frame(date=dataset2$date,forecast=forecastedData)
data.set <- output

# Select data.frame to be sent to the output Dataset port
maml.mapOutputPort("data.set");

 

ETS seasonal & ETS non-seasonal

#ETS seasonal / ETS non-seasonal 
library(forecast)
# Map 1-based optional input ports to variables
dataset1 <- maml.mapInputPort(1) # class: data.frame
dataset2 <- maml.mapInputPort(2) # class: data.frame

#Add the seasonality here
#Assign seasonality as 'a' for non-seasonal ETS  
seasonality<-12
labels <- as.numeric(dataset1$data)
timeseries <- ts(labels,frequency=seasonality)
model <- ets(timeseries)
numPeriodsToForecast <- ceiling(max(dataset2$date)) - ceiling(max(dataset1$date))
numPeriodsToForecast <- max(numPeriodsToForecast, 0)
forecastedData <- forecast(model, h=numPeriodsToForecast)
forecastedData <- as.numeric(forecastedData$mean)

output <- data.frame(date=dataset2$date,forecast=forecastedData)
data.set <- output

# Select data.frame to be sent to the output Dataset port
maml.mapOutputPort("data.set");

 

The advantage of using R script for the prediction is the ability of customizing the script as you want. But if you want looking for an instant solution for doing time series prediction, there’s a custom module in Cortana Intelligence gallery to do time series forecasting.

https://gallery.cortanaintelligence.com/Experiment/Time-Series-Forecasting-using-Custom-Modules-1

You just have to open that in your studio and re-use the built modules in your experiment. See what’s happening to your sales in next December! 🙂

Competing in Kaggle with Azure Machine Learning

MLData science is one of the most trending buzz words in the industry today. Obviously you’ve to have hell a lot of experience with data analytics, understanding on different data science related problems and their solutions to become a good data scientist.

Kaggle (www.kaggle.com) is  a place where you can explore the possibilities of data science, machine learning and related stuff. Kaggle is also known as “the home of data science” because of it’s rich content and the wide community behind it. You can find out hundreds of interesting datasets uploaded by data science enthusiasts all around the world on Kaggle. The most fascinating thing that you can find on Kaggle is competitions! Some competitions are bound with exciting prize tags while some competitions offer wonderful job opportunities when you score a top rank on it.

As we discussed in previous posts, Azure Machine Learning enables you to deploy and test predictive analytics experiments easily. Sometimes you need to not to code a single line to develop a machine learning model. So let’s start our journey on Kaggle with Azure Machine Learning.

01. Sign up for Kaggle – Go to kaggle.com & sign up using your Facebook/Google or LinkedIn account. It’s totally free! 🙂

Kaggle landing page

Kaggle landing page

02. Register for a Kaggle competition – Under the competition section, you can find out many competitions. Will start from a simple experiment that doesn’t go with any prize tag or job offering but worth enough to try out as your first experience on Kaggle.

Can you classify monsters?

Can you classify monsters?

03. Ghouls, Goblins, and Ghosts… Boo! Search for this competition categorized under ‘Knowledge’ sector of the competitions.  The task you have to do in the competition is described precisely on ‘Competition Details’

04. Get the data – After accepting the terms and conditions of Kaggle, you can download the training dataset, test dataset and the sample submission in .csv format. Make sure to take a deep look on features and understand whether you need some kind of data preprocessing before jumping into the task 😉

05. Understand the problem – You can easily figure out this is a multi-class classification machine learning problem. So let’s handle it on that way!

06. Get the data to your Studio – Here comes Azure Machine learning! Go to AML Studio (Setting up Azure Machine Learning is discussed here) and upload the data files through ‘Add Files’ option.

07. Build the classifier experiment – Same as building a normal AML experiment. Here I’ve split the training dataset to evaluate the model. The model with highest accuracy has chosen to do the predictions. ‘Tune model hyperparameter’ has used to find the optimal model parameters.

Classifier Experiment

Classifier Experiment

08. Do the prediction – Now it’s time to use the trained model to predict the type of the ghost using the data in test dataset. You can download the predicted output using ‘Convert to CSV’ module.

Predicting with the trained model

Predicting with the trained model

09. Submission – Make sure to create the output according to the sample submission.

10. Upload the submission to Kaggle –  You can compete as a team or individual. See where you are in the list!

Here's I'm the 278th! :)

Here’s I’m the 278th! 🙂

That’s it! You’ve just completed your first Kaggle competition. This might not lift you to the top of the competitors list. But it’s not impossible to use Azure Machine Learning in real world machine learning related problem solving.