Time Series Forecasting with Azure ML

airline1_web-0When we have a series of data points indexed in time order we can define that as a “Time Series”. Most commonly, a time series is a sequence taken at successive equally spaced points in time. Monthly rainfall data, temperature data of a certain place are some examples for time series.

In the field of predictive analytics, there are many incidents that need to analyze time series data and forecast the future values of that based on the previous values. Think of a scenario where you’ve to do a time series prediction for your business data or an incident where part of your predictive experiment contains a time series field that need to predict the future data points… There are many algorithms and machine learning models that you can use for forecasting time series values.

Multi-layer perception, Bayesian neural networks, radial basis functions, generalized regression neural networks (also called kernel regression), K-nearest neighbor regression, CART regression trees, support vector regression, and Gaussian processes are some machine learning algorithms that can be used for time series forecasting.

See here for more about these methods

Autoregressive Moving Average (ARIMA), Seasonal-ARIMA, Exponential smoothing (ETS) are some algorithms that widely used for this kind of time series analysis. I’m not going to dig deep into the algorithms, trend analysis and all numbers & characteristics bound with time series. Just going to demonstrate a simple way that you can do time series analysis in your deployments using Azure ML Studio.

After adding a dataset that contains a time series data into AzureML Studio, you can perform the time series analysis and predictions by using python or R scripts. In addition to that ML Studio offers a pre-built module for Anomaly detection of time series datasets. It can learn the normal characteristics of the provided time series and detect deviations from the normal pattern.

Here I’ve used forecast R package to write code snippets enabling AzureML Studio to do TS forecasting using popular time series algorithms namely as ARIMA, Seasonal ARIMA and ETS.

ARIMA seasonal & ARIMA non-seasonal

#ARIMA Seasonal / ARIMA non-seasonal 
# Map 1-based optional input ports to variables
dataset1 <- maml.mapInputPort(1) # class: data.frame
dataset2 <- maml.mapInputPort(2) # class: data.frame

#Enter the seasonality of the timeseries here
#For non-seasonal model use '1' as the seasonality
labels <- as.numeric(dataset1$data)
timeseries <- ts(labels,frequency=seasonality)
model <- auto.arima(timeseries)
numPeriodsToForecast <- ceiling(max(dataset2$date)) - ceiling(max(dataset1$date))
numPeriodsToForecast <- max(numPeriodsToForecast, 0)
forecastedData <- forecast(model, h=numPeriodsToForecast)
forecastedData <- as.numeric(forecastedData$mean)

output <- data.frame(date=dataset2$date,forecast=forecastedData)
data.set <- output

# Select data.frame to be sent to the output Dataset port


ETS seasonal & ETS non-seasonal

#ETS seasonal / ETS non-seasonal 
# Map 1-based optional input ports to variables
dataset1 <- maml.mapInputPort(1) # class: data.frame
dataset2 <- maml.mapInputPort(2) # class: data.frame

#Add the seasonality here
#Assign seasonality as 'a' for non-seasonal ETS  
labels <- as.numeric(dataset1$data)
timeseries <- ts(labels,frequency=seasonality)
model <- ets(timeseries)
numPeriodsToForecast <- ceiling(max(dataset2$date)) - ceiling(max(dataset1$date))
numPeriodsToForecast <- max(numPeriodsToForecast, 0)
forecastedData <- forecast(model, h=numPeriodsToForecast)
forecastedData <- as.numeric(forecastedData$mean)

output <- data.frame(date=dataset2$date,forecast=forecastedData)
data.set <- output

# Select data.frame to be sent to the output Dataset port


The advantage of using R script for the prediction is the ability of customizing the script as you want. But if you want looking for an instant solution for doing time series prediction, there’s a custom module in Cortana Intelligence gallery to do time series forecasting.


You just have to open that in your studio and re-use the built modules in your experiment. See what’s happening to your sales in next December! 🙂

Competing in Kaggle with Azure Machine Learning

MLData science is one of the most trending buzz words in the industry today. Obviously you’ve to have hell a lot of experience with data analytics, understanding on different data science related problems and their solutions to become a good data scientist.

Kaggle (www.kaggle.com) is  a place where you can explore the possibilities of data science, machine learning and related stuff. Kaggle is also known as “the home of data science” because of it’s rich content and the wide community behind it. You can find out hundreds of interesting datasets uploaded by data science enthusiasts all around the world on Kaggle. The most fascinating thing that you can find on Kaggle is competitions! Some competitions are bound with exciting prize tags while some competitions offer wonderful job opportunities when you score a top rank on it.

As we discussed in previous posts, Azure Machine Learning enables you to deploy and test predictive analytics experiments easily. Sometimes you need to not to code a single line to develop a machine learning model. So let’s start our journey on Kaggle with Azure Machine Learning.

01. Sign up for Kaggle – Go to kaggle.com & sign up using your Facebook/Google or LinkedIn account. It’s totally free! 🙂

Kaggle landing page

Kaggle landing page

02. Register for a Kaggle competition – Under the competition section, you can find out many competitions. Will start from a simple experiment that doesn’t go with any prize tag or job offering but worth enough to try out as your first experience on Kaggle.

Can you classify monsters?

Can you classify monsters?

03. Ghouls, Goblins, and Ghosts… Boo! Search for this competition categorized under ‘Knowledge’ sector of the competitions.  The task you have to do in the competition is described precisely on ‘Competition Details’

04. Get the data – After accepting the terms and conditions of Kaggle, you can download the training dataset, test dataset and the sample submission in .csv format. Make sure to take a deep look on features and understand whether you need some kind of data preprocessing before jumping into the task 😉

05. Understand the problem – You can easily figure out this is a multi-class classification machine learning problem. So let’s handle it on that way!

06. Get the data to your Studio – Here comes Azure Machine learning! Go to AML Studio (Setting up Azure Machine Learning is discussed here) and upload the data files through ‘Add Files’ option.

07. Build the classifier experiment – Same as building a normal AML experiment. Here I’ve split the training dataset to evaluate the model. The model with highest accuracy has chosen to do the predictions. ‘Tune model hyperparameter’ has used to find the optimal model parameters.

Classifier Experiment

Classifier Experiment

08. Do the prediction – Now it’s time to use the trained model to predict the type of the ghost using the data in test dataset. You can download the predicted output using ‘Convert to CSV’ module.

Predicting with the trained model

Predicting with the trained model

09. Submission – Make sure to create the output according to the sample submission.

10. Upload the submission to Kaggle –  You can compete as a team or individual. See where you are in the list!

Here's I'm the 278th! :)

Here’s I’m the 278th! 🙂

That’s it! You’ve just completed your first Kaggle competition. This might not lift you to the top of the competitors list. But it’s not impossible to use Azure Machine Learning in real world machine learning related problem solving.


Azure ML Web Services gets a new look

Huge buzz going on Machine Learning. What for?  Building intelligent apps is one of the dominant usages of machine learning. Web service is one of the understandable “language” for software developers. If the data scientists can provide a web service for the line of devs, they’ll be super excited because they only have to deal with JSON; not regression algorithms or neural networks! 😀

Azure ML studio provides you the power to deploy web services easily and nice interface that a software developer can understand. Consuming a web service built with Azure machine learning has become pretty easy because it even provide you the code samples and the sample JSONs that transfer in and out.




Recently AzureML Studio has come out with a new interface for managing the web services. Now it’s pretty easy for manage and monitor the behavior of your web services.

Go for your ML Studio. In web services section, you’ll find a new link directing to “New web services experience”. Currently it’s in the preview.


New web services dashboard


Dashboard shows the performance of the web service that you built. The average execution time is shown there. Even you can get a glimpse on monetary terms attached with consuming the web service with the dashboard.

Testing the web services can be done through the new portal. If you want to build web application to consume the web service you built, can direct to the azure web app template that is pre-built for consuming ML web services.

Take a look from (http://services.azureml.net)  you’ll get used to it! 😀



Building a News Classifier with Azure ML

newsClassification is one of the most popular machine learning applications used. To classify spam mails, classify pictures, classify news articles into categories are some well known examples where machine learning classification algorithms are used.

This sample demonstrates how to use multiclass classifiers and feature hashing in Azure ML Studio to classify BBC news dataset into appropriate news category.

The popular 2004-2005 BBC news dataset has been used for this experiment. The dataset consists of 2225 documents from the BBC news website corresponding to stories in five topical areas from 2004-2005. The news is classified into five classes as Business, Entertainment, Politics, Sports and Tech.

Original dataset downloaded from “Insight Resources”  Dataset consisted 5 directories, each containing text files with the news articles of particular category.

The data has been converted to a CSV file that fits with ML Studio by running a C# console application.

using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Text;
using System.Threading.Tasks;

namespace ConsoleApplication1
    class Program
        static void Main(string[] args)
			//Specify the Directory location 
            string dir = @"D:\Document_Classification\bbc full text\bbc"; 
            var dirs = Directory.EnumerateDirectories(dir);

            List<string> csv = new List<string>();

            StreamWriter sw = new StreamWriter(dir + @"\BBCNews.csv");
            int index = 1;
            foreach(var d in dirs)
                foreach(var file in Directory.EnumerateFiles(d))
                    string content = File.ReadAllText(file).Replace(',', ' ').Replace('\n',' ');
                    sw.WriteLine((index++)+","+content+","+new DirectoryInfo(d).Name);


The names of the categories has been used as the class label, or attribute to predict.  The CSV file has uploaded to Azure ML Studio to use for the experiment.

Data Preparation –

The dummy column headings was replaced with meaningful column names using Metadata Editor. Missing values were cleared by removing the entire row of containing the missing value.

Term frequency–inverse document frequency (TF-IDF) of each unigram was calculated. The bit-size as 15 bits was specified to extract 2^15 = 32,768 hashing features. Top 5000 related features were selected for this experiment.

Feature Engineering –
I used the Feature Hashing module to convert the plain text of the articles to integers and used the integer values as input features to the model.


BBC classifier model

Predictive Experiment built on Azure ML Studio


Multiclass Neural Networks module with default parameters has been used for training the model. The parameters were tuned using “Tune model Hyperparameters” module.

R script for creating word vocabulary –

# Map 1-based optional input ports to variables
dataset <- maml.mapInputPort(1) # class: data.frame
input.dictionary <- maml.mapInputPort(2) # class: data.frame
# Determine the following input parameters:-
# minimum length of a word to be included into the dictionary. 
# Exclude any word if its length is less than *minWordLen* characters.
minWordLen <- 3

# maximum length of a word to be included into the dictionary. 
# Exclude any word if its length is greater than *maxWordLen* characters.
maxWordLen <- 25

# we assume that the text is the first column in the input data frame
label_column <- dataset[[2]]
text_column <- dataset[[1]]

# Contents of optional Zip port are in ./src/
data.set <- calculate.TFIDF(text_column, input.dictionary, 
	minWordLen, maxWordLen)
data.set <- cbind(label_column, data.set)

# Select the document unigrams TF-IDF matrix to be sent to the output Dataset port

R Script for text preprocessing

# Map 1-based optional input ports to variables
dataset <- maml.mapInputPort(1) # class: data.frame
# Determine the following input parameters:-
# minimum length of a word to be included into the dictionary. 
# Exclude any word if its length is less than *minWordLen* characters.
minWordLen <- 3

# maximum length of a word to be included into the dictionary. 
# Exclude any word if its length is greater than *maxWordLen* characters.
maxWordLen <- 25

# minimum document frequency of a word to be included into the dictionary. 
# Exclude any word if it appears in less than *minDF* documents.
minDF <- 9

# maximum document frequency of a word to be included into the dictionary. 
# Exclude any word if it appears in greater than *maxDF* documents.
maxDF <- Inf
# we assume that the text is the first column in the input data frame
text_column <- dataset[[1]]

# Contents of optional Zip port are in ./src/

# the output dictionary includes each word, its DF and its IDF
input.voc <- create.vocabulary(text_column, minWordLen, 
	maxWordLen, minDF, maxDF)
# the output dictionary includes each word, its DF and its IDF 
data.set <- calculate.IDF (input.voc, minDF, maxDF)

# Select the dictionary to be sent to the output Dataset port

Results –
All accuracy values were computed using evaluate module.

This sample can be deployed as a web service and consume for a news classification application. But make sure that you are training the model using the appropriate training data.

Here’s the confusion matrix came as the output. Seems pretty good!


Azure Machine Learning provide you the power of cloud to make complex time consuming machine learning problems more easy to compute. Build your own predictive module using AML Studio and see how easy it is. 🙂

You can check out the built experiment in Cortana Intelligence Gallery here! 🙂


Citation for the dataset –
D. Greene and P. Cunningham. “Practical Solutions to the Problem of Diagonal Dominance in Kernel Document Clustering”, Proc. ICML 2006.

Modules & Capabilities of Azure Machine Learning – Azure ML Part 03

Through the journey of getting familiar with Azure Machine Learning, cloud based machine learning platform of Microsoft, we discussed about the very first steps of getting started.
When you open up the online studio through your favorite web browser, you’ll directed to create a blank experiment. Let’s start with it.

start screen
Blank Experiment in Azure ML Studio

In your left hand side of the studio, you can see the pre-built modules that you can use to develop your experiments. If they are not enough for your case, you can use R or Python scripts in your experiment.
With Azure ML Studio, you get the ability to deploy models for almost all the machine learning problem types. The algorithms you can use for classification, regression and clustering are in the AML cheat sheet that you can download from here.(http://download.microsoft.com/download/A/6/1/A613E11E-8F9C-424A-B99D-65344785C288/microsoft-machine-learning-algorithm-cheat-sheet-v6.pdf)    machine-learning-algorithm-cheat-sheet-small_v_0_6-01

Will take a look into the sections that modules are categorize. If you want to find a specific module, what you have to do is search the experiment item from the search box.

Saved datasets – You can find out a set of sample datasets that you can use for experiments. Most of the popular machine learning related datasets like “iris dataset” are available here. If you want your own dataset in the studio, you can upload it to here.

Trained models – These are the models that you get as the output after training the data using an appropriate algorithm and methodology. They can be used for building another experiment or a web service later.

Data Format Conversions – The data comes in and going out from the experiment can be converted into a desired format using the modules in this section. If you wish to convert the output of your experiment to ARFF format (which supported in Weka) or to a CSV file you can use the modules here.

Data input & output – Azure ML has the ability to get data from various sources directly.  You can use an Azure SQL database, Azure BLOB storage or a hive query to get the data. Fetching data from a local SQL server is on preview yet (August 2016).

Data transformation – Data transformation tasks like normalization, clipping etc. can be done using the modules listed in this section. You can use SQL queries to do the data transformations if want.

Feature Selection – Appropriate feature selection increases the accuracy of your machine learning model drastically. There are three different methods as “Filter bases feature selection, Fisher linear discrimination and Permutation feature importance” that you can use according to your requirement.

Machine Learning – Within this section you can find out the modules built for training machine learning models, evaluate accuracy etc. Most of the popular machine learning algorithms used for classification, clustering and regression problems are listed down here as modules. The parameters of each module can be changed or use can you Tune Model Hyperparameters module to tune-up the experiment to get the optimal output.

OpenCV library Modules – ML is widely using in image recognition. In Azure ML there’s Predefined Cascade Image Classification that is trained to identify the images with front facing human faces.

Python language models – Python is one of the widely using languages in data mining and machine learning applications. With Azure ML studio you have the ability to execute your own python script using this module. 200+ common python libraries are supported with Azure ML right now.

R language models – Same as Python, R is one of the most favorite statistical languages among data scientists. You can use your favorite R scripts and train models with R using these modules. Most of the R packages are supported in Azure ML. If the package is not there you can import the packages for the experiment. (Unfortunately there are some limitations in this. Some R packages like RJava, openNLP are not supported yet with Azure ML – Aug.2016)

Statistical Functions – If you want to do some mathematical functions for the data or perform statistical operations, here you can find out the modules for that. A basic descriptive statistical analysis on the dataset also can be performed using the modules.

Text Analytics – Machine learning models can be used for text analytics. There are some modules included in Azure ML studio for text preprocessing (omit the stop words, punctuation marks, white spaces etc.), Named entity recognition (Pre trained module) and many more. Vawpal Wabbit learning system library is also included in the modules for the use.

Web service – One of the most notable advantages in Azure ML is the ability to deploy as a web service. Here’s the web service input and output modules that can be used for the built experiments.

Deprecated – Assigning data for clusters, binning, quantizing data, cleansing missing data can be done using these modules.

Building Azure ML experiments and deploying web applications using them are not that hard.

This is one of the best step by step guide for that task from MSDN.

In the coming posts will discuss on interesting applications in Azure ML hacks to build your predictive models.
Play with the tool and leave your experience as comments below.  🙂


Behind the Scene – Azure ML Part 02

OverviewOfAzureML_960With the power of cloud, we going to play with data now! 🙂

Machine Learning is a niche part of predictive analysis. Predictive analysis gets its power from the tools and techniques like mathematics, statistics, data mining, machine learning etc.… Predictive analysis doesn’t refer only predicting future events; real-time fraud credit card transaction detection also falls under a usage of predictive analysis.

Am not going to discuss the usages of machine learning and what you can do with machine learning methods. Let’s see what are the benefits that you getting by using Azure ML Studio for your analysis.

Fully managed scalable cloud service –You have to deal with thousands, mostly with millions of data records when you doing your analysis. The computation power of the local machine may not be sufficient for those kind of mammoth tasks. Get the use of Azure scalable & efficient cloud. It’ll make your predictions super-fast.

Ability to develop & deploy –Want to deploy an application that get intelligence with a ML backend? AzureML Studio is the best solution then. It provides you the ability to easily deploy a web service from your built ML model and use that in your application. REST will do the rest. J

Friendly user interface for data science workflow –I’m pretty sure dragging and dropping is your ‘thing’ right? So AML Studio suits for you! D from data loading to deployment of the web service, you get a friendly UI where mostly you can just drag and srop the modules into the workspace without bothering about their underlying complex algorithms.

Wide range of ML algorithms inbuilt –No need to start from the scratch. There are plenty of ML algorithms pre built as models in AML Studio. You can use them right away for building models.

R & Python integration –For data scientists, R and Python are like life blood. IF you wish to do intergrade your own scripts in the model, with AML Studio you have the chance here. You can choose either R/python or the both. AML Studio takes care of it.

Support for R libraries –R language has its vibrant user community and the rich set of libraries. With AML studio you get the access for most of the R libraries and you can add more libraries if want too.


Azure Machine Learning Process

Let’s go with the process. All starts with defining the objective. Before jumping into the problem, you should have a clear idea on what you going to do. Whether it’s a classification, linear regression, recommendation… you should be able to figure out it by skimming through the data sources and the problem definition.


Then the Data! Data maybe a set of sales data in your enterprise cloud or in your local storage. Identify the relevant data fields and components that you want for building up the model. If dataset exceeds 10GB, it’s better to store the data in Azure SQL database first and get the data through the ‘Import Data’ module. You can use HDInsight stored data using Hive queries too.

Pay attention on the data quality. Normally real world data is noisy, full of outliers, error values, missing values etc. So data preprocessing should be done first.  Make sure the data fields are in the appropriate type (Numerical, categorical, etc.) In Azure ML there are plenty of modules that you can perform data preprocessing tasks.

Model Development! Here’s the fun part. You can use ML algorithms comes with studio or you can go with your own scripts in R or python here. If you familiar with ML model development platforms like Weka, RapidMiner, Orange you will find out this is, it is not so different. You have to put the right module at the right place. Have to use right algorithm to take the right decision.

After developing the model, normally we should train the models. For that you can use the past data that you have. You must always keep a portion from your dataset for testing the model too.

Is it over after training the model? No. Many more in the process. You should score and evaluate the model you built. It is useless if the predictions you making with the model you built is having a high error rate. You may haven’t use the appropriate algorithm or you may haven’t use the correct and optimal parameters. So using the ‘score model’ and ‘evaluate model’ you can compare different algorithms for the particular task and pick the best one out from them.

It’s obvious that ML algorithms are not 100% accurate always. But the model you building should have an accuracy more than a wild guessing.

After building your predicting magic box, you can publish it as a web service. This allows you to consume it either by a custom application, Microsoft Excel or similar tool.

For more accuracy, normally this process goes in an iterative manner.

Finishing up the theories and let’s get our hands dirty with our experiments!

Simply there are 3 steps to start working with Azure ML

  1. Navigate to AzureML and choose your subscription plan
  2. Create a Machine Learning workspace in Azure Portal
  3. Sign in to ML Studio

Step 01 –Go to http://www.azure.com and products -> Analytics -> Machine Learning

cap1You can use AzureML absolutely for free. But if you want to deploy a web service and play with serious tasks have to go for an appropriate subscription. If you have a MSDN subscription, you can use it here 🙂


Azure ML subscriptions

Step 02 –You need an Azure account here. If you don’t have one go for the 3-month free trial.

cap3In the portal go for new -> data + analytics -> Machine Learning

From there you can create your workspace to do the machine learning tasks.

Step 03 –Sign in to the Azure ML Studio from https://studio.azureml.net

cap4Now you are there! Click on the new -> Blank experiment!

We are ready to start the now.

The GUI of the AML Studio is pretty clear and easy to understand. Try to find out the way to upload the datasets and the modules that contains the ML algorithms from the pane in the left hand side.

Will explore some cool capabilities of Azure ML in the coming posts. Here’s a video for your motivation.

Part 01

Let’s Jump In! – Azure ML Part 01


“In the world of intelligent applications, data will be the king!”. Despite of way they making the revenue, data has become the main asset of each company. Sales and distribution data, customer data repos, employee records, all sort of structured and unstructured data have become the life blood of the company’s business process because it is vital to get the accurate and relevant data to get the correct business decisions and do relevant business related predations.

Digital data and cloud storage follow Moore’s law: the world’s data doubles every two years, while the cost of storing that data declines at roughly the same rate.


This abundance of large amounts of data enables more features and tasks, and better machine learning models and methodologies should to be created for predictive analytics.

When the data is widely available in the cloud, and when it needs large computation power and infrastructure to process and analyze data repositories, the best move is the cloud!

Machine learning (ML) is starting to move to the cloud, where a scalable web service is an API call away. Data scientists will no longer need to manage infrastructure or implement custom code. The systems will scale for them, generating new models on the fly, and delivering faster, more accurate results.

What is Machine Learning?

Simply, machine learning is teaching the silicon chips to think! 😀 If we use the general definition: “Machine learning is the systematic study of algorithms and systems that improve their knowledge or performance with experience”

When you going through the theories behind machine learning you may find it is closely related to computational statistics, where you use computers in prediction making.  Machine learning comes out with range of computing tasks to solve problems where designing and programming explicit algorithms is unfeasible.

All of these things mean it’s possible to quickly and automatically produce models that can analyze bigger, more complex data and deliver faster, more accurate results – even on a very large scale. The result? High-value predictions that can guide better decisions and smart actions in real time without human intervention.

Where the hell ML is used?

Did you notice that eBay is pushing you to buy a protective glass after you buying a fancy phone case for your iPhone? Netflix is suggesting movies for you? Siri or Cortana speech recognition? All these tiny miracles have been possible with the power of machine learning. Spam filtering you emails, speech recognition, recommender systems in electronic commerce are some famous applications of machine learning.

So… How we going to do?

If you google or do a Bing search on machine learning, you’ll find out hundreds of ways of applying machine learning techniques in practical applications and tools that we can use to create machine learning models.


Here’s a glimpse of Intelligent App Stack

With my post series, mainly am going to take you a journey with Azure Machine Learning Studio, which comes under the Cortana Intelligence Suite.

Why AzureML?


With advanced capabilities, free access, strong support for R, cloud hosting benefits, drag-and-drop development and many more features, Azure ML is ready to take the consumerization of ML to the next level.

It’s easy as ABC and powerful enough to handle petabytes of data with the power of Azure.


Basics on computing and statistics will be useful to go forward. It’s fantastic if you have a rough idea about the machine learning algorithms, data pre preparation methods kind of stuff. Don’t worry. Here’s a book to read!  🙂

So will take the first step to Azure ML in the coming post.

Part 02