Democratizing Machine Learning with Cloud

HiRes.jpg.800x600_q96We have already passed the era of gigabytes when it comes to data. World is talking about terabytes of unstructured data and massive amounts of data points generated from IoT devices and sensors in millions per a second. To analyze these heaps of data, obviously, we need large computation power and massive storage. Building workhorse machines to fulfil those tremendous workloads would definitely cost a lot. Cloud computing paradigm comes handy here. The resourcefulness and the scalability of the public cloud can be used to perform the large calculations in machine learning algorithms.

Almost all the major public cloud providers in the market comes up with machine learning services. Cloud machine learning services in Google Cloud Platform provides modern machine learning services, with pre-trained models and a service to generate your own tailored models. Amazon Machine Learning is a service that makes it easy for developers of all skill levels to use machine learning technology. IBM analytics comes up with a machine learning platform with its cloud data services. Azure Machine Learning Studio is a GUI-based integrated development environment for constructing and operationalizing Machine Learning workflow on Azure. We discussed a lot about Azure Machine Learning and its appliances in practical scenarios in the previous posts.

All the mentioned platforms provide machine learning as a service. Most of the platforms offer pre-built ML algorithms in packages. Simple drag and drop user interactions and easy deployment has attracted many developers to use these tools.

But, how would it be if you want to go from the scratch? Either you want to use the power of Graphical Processing Units (GPUs) to process the ML algorithms parallelly? Cloud based Virtual Machines specifically optimized for computation is one of the best solutions that you can consume.

Azure Data Science Virtual Machine (DSVM) –

dsvm

DSVM in Azure Portal

If you already have used Azure virtual machines for your computation, hosting or storage tasks, this would not be a new concept for you. Azure DSVM is specifically optimized for large computations. Azure DSVM comes in two flavors. One with Windows and the other with Linux. You can choose the hardware configurations as you wish. Many development environments, programming IDEs, languages are pre-installed in the VM instances.

dsvm_linuxMy personal favorite here is the Linux DSVM instance. Here I’ve created a Linux DSVM with the basic configurations. For accessing the VM you can use any tool that can do a SSH call. What I normally do is calling the accessing the VM using Ubuntu Bash on Windows 10.

GPUs for machine learning –

GPU_1

GPU_2

Configurations of the Linux VM with Nvidia GPU

Many machine learning algorithms currently available can be executed parallely. Execution parts of those algorithms are embarrassingly parallel. With that parallel programming, you can reduce the execution time of the algorithms drastically. Data scientists in both industry and academia have been using GPUs for machine learning to make groundbreaking improvements across a variety of applications including image classification, video analytics, speech recognition and natural language processing.

google_brain

GPUs Vs. CPU computing

Specially in Deep Learning, parallel processing using GPUs can make a drastic decrease in computation time. Purchasing a deep learning dream machine powered with a CUDA enabled high-end GPU such as Nvidia Tesla K80 would cost nearly 6000 dollars! Rather than spending a lot on a machine like that, the most feasible plan is to provision a virtual machine with the specifications we need and pay as we consume.

VM_size

VM instance price plans

The N-series is a family of Azure Virtual Machines with GPU capabilities that you can use for these kinds of tasks. The N-series will feature the NVIDIA Tesla accelerated platform as well as NVIDIA GRID 2.0 technology, providing the highest-end graphics support available in the cloud today. Through your Azure portal, you can choose a desired price plan with the desired configurations for your tasks when provisioning the VM.

teslaHere’s my Azure VM specifically configured for deep learning exercises. The machine is powered with Tesla K80 GPU which is having 4992 cores in it!! I installed anaconda for that and doing computations using Jupyter notebooks.

Just a hint: stop your VM instance when you are not using it for computation to avoid getting huge unnecessary bills. 😉

No need of huge wallets! The wise decision would be applying cloud technologies for machine learning.

Simple Linear Regression with Azure ML + Python

1419973816879Simple linear regression is a statistical method that allows us to summarize and study relationships between two continuous (quantitative) variables: One variable, denoted x, is regarded as the predictor, explanatory, or independent variable. The other variable, denoted y, is regarded as the response, outcome, or dependent variable.

Typically when we doing regression analysis, we consider about the correlation of coefficient of the input variables. Correlation analysis measures the extent to which two variables vary together, including the strength and direction of their relationship.

correlation_dot_graphsLinear correlation coefficient(also called Pearson product-moment correlation coefficient) measure of the strength and direction of a linear association between two random variables.

I used the Istanbul Stock Exchange dataset to demonstrate the steps in doing a simple linear regression prediction. Azure Machine Learning experiment has built (get the experiment from here) for building the regression model. Built-in Bayesian Linear Regression algorithm has been used for building the model.

capture1The most interesting part is coming with python! 🙂

I’ve used a Jupyter Notebook and fetched the data to that workspace to visualize the dataset and to calculate the coefficient values between each variable. Pearsonr method in scipy library has used for that.

Refer the iPython notebook from Azure Notebook for the complete python script and the visualizations.

https://notebooks.azure.com/library/Python%20Visualizations/html/Istanbul%20Stock%20Python%203%20notebook.ipynb

Do run the code by your own. You’ll get it for sure!

 

Jupyter Notebook on AzureML

plot_regression_3d_1 If you are fond of playing with data to dig out the relationships of it and to plot interesting visualizations with data; python is the language you should speak.

Over the years, with the strong community support, python language got dedicated libraries for data analysis and predictive modeling like scikit-learn, Tensorflow, Theano etc. Even the ultimate IDE in town; Visual Studio started supporting python! So, no hesitation. Python is a great choice to make.

You can use many IDEs or even a simple text editor to write your python files. But python comes with a handy web application; Jupyter notebook that can be used to do your code. Even compile it!

Jupyter gets its birth in 2014 as a spin-off project of IPython; which is a command shell for interactive computing in multiple programming languages, originally developed for the Python.

Why Jupyter?

Jupyter notebook is a very popular tool among data scientists which as a web application that allows you to create and share documents that contain live code, equations, visualizations and explanatory text. “Jupyter” is a loose acronym meaning Julia, Python and R. One of the most prominent uses you get when using Jupyter notebook is the ability of sharing the data transformation and visualization steps with your peers.

If you want to run Jupyter notebook in your local machine do refer the link below. With a few easy steps, you can have Jupyter notebook up and running in your machine.

http://jupyter.readthedocs.io/en/latest/install.html

One of the easiest ways to use Jupyter is running the notebook on Azure. No need to have python or the dependencies of it installed on your local machine. You can create, edit and share the Jupyter notes using Azure Machine Learning Studio. All the execution happens on the cloud.

Let’s get started!

1Access your notebook from “Notebooks” tab of AzureML Studio. When creating a new notebook, you can select which language and version you want to have in your notebook. Python 2, Python 3 and R are the supported languages right now.

Same as the Jupyter notebook running on the local machine, you get the same IPython interface on your browser.

2On the notebook menu bar, you can find out the ‘help’ menu which contains a brief user interface tour as well as a list of keyboard shortcuts that you can use to drive the notebook.

Here’s a little data mashup I’ve done using the famous ‘Iris dataset’ included in python sklearn. The .ipynb file is available on my github repo. Feel free to download and play with. A static html page created with the notebook output also included in the repo.

Azure is coming up with Azure Notebook preview feature. Here’s Iris visualization hosted on Azure Notebook

https://notebooks.azure.com/library/Python%20Visualizations/html/Iris+Data+Visualization.ipynb

No Machine learning algorithms or complex code snippets here. Just a data visualization & data transformation. 🙂

 

 

 

Time Series Forecasting with Azure ML

airline1_web-0When we have a series of data points indexed in time order we can define that as a “Time Series”. Most commonly, a time series is a sequence taken at successive equally spaced points in time. Monthly rainfall data, temperature data of a certain place are some examples for time series.

In the field of predictive analytics, there are many incidents that need to analyze time series data and forecast the future values of that based on the previous values. Think of a scenario where you’ve to do a time series prediction for your business data or an incident where part of your predictive experiment contains a time series field that need to predict the future data points… There are many algorithms and machine learning models that you can use for forecasting time series values.

Multi-layer perception, Bayesian neural networks, radial basis functions, generalized regression neural networks (also called kernel regression), K-nearest neighbor regression, CART regression trees, support vector regression, and Gaussian processes are some machine learning algorithms that can be used for time series forecasting.

See here for more about these methods

Autoregressive Moving Average (ARIMA), Seasonal-ARIMA, Exponential smoothing (ETS) are some algorithms that widely used for this kind of time series analysis. I’m not going to dig deep into the algorithms, trend analysis and all numbers & characteristics bound with time series. Just going to demonstrate a simple way that you can do time series analysis in your deployments using Azure ML Studio.

After adding a dataset that contains a time series data into AzureML Studio, you can perform the time series analysis and predictions by using python or R scripts. In addition to that ML Studio offers a pre-built module for Anomaly detection of time series datasets. It can learn the normal characteristics of the provided time series and detect deviations from the normal pattern.

Here I’ve used forecast R package to write code snippets enabling AzureML Studio to do TS forecasting using popular time series algorithms namely as ARIMA, Seasonal ARIMA and ETS.

ARIMA seasonal & ARIMA non-seasonal

#ARIMA Seasonal / ARIMA non-seasonal 
library(forecast)
# Map 1-based optional input ports to variables
dataset1 <- maml.mapInputPort(1) # class: data.frame
dataset2 <- maml.mapInputPort(2) # class: data.frame

#Enter the seasonality of the timeseries here
#For non-seasonal model use '1' as the seasonality
seasonality<-12
labels <- as.numeric(dataset1$data)
timeseries <- ts(labels,frequency=seasonality)
model <- auto.arima(timeseries)
numPeriodsToForecast <- ceiling(max(dataset2$date)) - ceiling(max(dataset1$date))
numPeriodsToForecast <- max(numPeriodsToForecast, 0)
forecastedData <- forecast(model, h=numPeriodsToForecast)
forecastedData <- as.numeric(forecastedData$mean)

output <- data.frame(date=dataset2$date,forecast=forecastedData)
data.set <- output

# Select data.frame to be sent to the output Dataset port
maml.mapOutputPort("data.set");

 

ETS seasonal & ETS non-seasonal

#ETS seasonal / ETS non-seasonal 
library(forecast)
# Map 1-based optional input ports to variables
dataset1 <- maml.mapInputPort(1) # class: data.frame
dataset2 <- maml.mapInputPort(2) # class: data.frame

#Add the seasonality here
#Assign seasonality as 'a' for non-seasonal ETS  
seasonality<-12
labels <- as.numeric(dataset1$data)
timeseries <- ts(labels,frequency=seasonality)
model <- ets(timeseries)
numPeriodsToForecast <- ceiling(max(dataset2$date)) - ceiling(max(dataset1$date))
numPeriodsToForecast <- max(numPeriodsToForecast, 0)
forecastedData <- forecast(model, h=numPeriodsToForecast)
forecastedData <- as.numeric(forecastedData$mean)

output <- data.frame(date=dataset2$date,forecast=forecastedData)
data.set <- output

# Select data.frame to be sent to the output Dataset port
maml.mapOutputPort("data.set");

 

The advantage of using R script for the prediction is the ability of customizing the script as you want. But if you want looking for an instant solution for doing time series prediction, there’s a custom module in Cortana Intelligence gallery to do time series forecasting.

https://gallery.cortanaintelligence.com/Experiment/Time-Series-Forecasting-using-Custom-Modules-1

You just have to open that in your studio and re-use the built modules in your experiment. See what’s happening to your sales in next December! 🙂

Competing in Kaggle with Azure Machine Learning

MLData science is one of the most trending buzz words in the industry today. Obviously you’ve to have hell a lot of experience with data analytics, understanding on different data science related problems and their solutions to become a good data scientist.

Kaggle (www.kaggle.com) is  a place where you can explore the possibilities of data science, machine learning and related stuff. Kaggle is also known as “the home of data science” because of it’s rich content and the wide community behind it. You can find out hundreds of interesting datasets uploaded by data science enthusiasts all around the world on Kaggle. The most fascinating thing that you can find on Kaggle is competitions! Some competitions are bound with exciting prize tags while some competitions offer wonderful job opportunities when you score a top rank on it.

As we discussed in previous posts, Azure Machine Learning enables you to deploy and test predictive analytics experiments easily. Sometimes you need to not to code a single line to develop a machine learning model. So let’s start our journey on Kaggle with Azure Machine Learning.

01. Sign up for Kaggle – Go to kaggle.com & sign up using your Facebook/Google or LinkedIn account. It’s totally free! 🙂

Kaggle landing page

Kaggle landing page

02. Register for a Kaggle competition – Under the competition section, you can find out many competitions. Will start from a simple experiment that doesn’t go with any prize tag or job offering but worth enough to try out as your first experience on Kaggle.

Can you classify monsters?

Can you classify monsters?

03. Ghouls, Goblins, and Ghosts… Boo! Search for this competition categorized under ‘Knowledge’ sector of the competitions.  The task you have to do in the competition is described precisely on ‘Competition Details’

04. Get the data – After accepting the terms and conditions of Kaggle, you can download the training dataset, test dataset and the sample submission in .csv format. Make sure to take a deep look on features and understand whether you need some kind of data preprocessing before jumping into the task 😉

05. Understand the problem – You can easily figure out this is a multi-class classification machine learning problem. So let’s handle it on that way!

06. Get the data to your Studio – Here comes Azure Machine learning! Go to AML Studio (Setting up Azure Machine Learning is discussed here) and upload the data files through ‘Add Files’ option.

07. Build the classifier experiment – Same as building a normal AML experiment. Here I’ve split the training dataset to evaluate the model. The model with highest accuracy has chosen to do the predictions. ‘Tune model hyperparameter’ has used to find the optimal model parameters.

Classifier Experiment

Classifier Experiment

08. Do the prediction – Now it’s time to use the trained model to predict the type of the ghost using the data in test dataset. You can download the predicted output using ‘Convert to CSV’ module.

Predicting with the trained model

Predicting with the trained model

09. Submission – Make sure to create the output according to the sample submission.

10. Upload the submission to Kaggle –  You can compete as a team or individual. See where you are in the list!

Here's I'm the 278th! :)

Here’s I’m the 278th! 🙂

That’s it! You’ve just completed your first Kaggle competition. This might not lift you to the top of the competitors list. But it’s not impossible to use Azure Machine Learning in real world machine learning related problem solving.

 

SQL support in R tools for Visual Studio

If you have any kind of interest in data science or machine learning, you’ll probably found out that R language is the ultimate survivor. If you are a developer familiar with Visual Studio, you don’t have to adopt for RStudio again. You can code R inside VS!

R Tools for Visual Studio (RTVS) recently released the 0.5 version. One useful feature comes with the new version is SQL integration. With that you can directly import the data loads in your SQL database to a R environment. SQL queries can help you to fetch the data that you want. You can easily play with the data using R then.

First, you have to have Visual Studio 2015 with update 3. (Visual Studio 2015 Comunity edition is freely available to download) Update your VS if you haven’t done it yet and download RTVS 0.5 from here & install it.
https://aka.ms/rtvs-current

1
In your R project you can add SQL Query item (Right click on solution explorer and “Add new item”) which is created as a *.sql file.

2
On the top of the panel you can connect the database using “connect” icon. There you should configure the server name, server authentication and the database details.

3

Inside the .sql file you can execute the typical SQL queries to fetch data from the SQL database. One main advantage of this is, by enabling the execution plan you can analyze and optimize the SQL query you written.

4

Adding a database connection for the R project –

Go to R tools -> Data -> Add Database Connectionconnection-prop
Provide the authentication details of the database that you want to access. Then test the connection using “Test Connection” button. After clicking ‘ok’, you can see the database connection string is automatically generated inside settings.R file. Within the R code you can access for data inside the particular database as shown in the following example code.

final-screen

The str() output is shown in the R console

The example shows the code used for accessing the data in ‘Iris Data’ table inside ‘DMDatasets’ database placed in the local SQL server. Make sure to install “RODBC” R package to use the database related functions inside R.

#Need RODBC package to extablish the ODBC database iterface
install.packages("RODBC")
require("RODBC")

#Auto-generated Settings.R file should be added as a source
#The connection string contains in this file  
source("Settings.R")
conn <- odbcDriverConnect(connection = dbConnection)

#To get the tables of particualr database
tbls <- sqlTables(conn, tableType = "TABLE")
print(tbls)

#The SQL query is used to fetch data from the table 
sql <- "SELECT * FROM [dbo].[Iris Data]"
df <- sqlQuery(conn, sql)
str(df)
#plotting the dataset
plot(df) 

No need of switching developer environments to handle your coding as well as data analytics tasks. Just keep Visual Studio as your default IDE! 🙂

Azure ML Web Services gets a new look

Huge buzz going on Machine Learning. What for?  Building intelligent apps is one of the dominant usages of machine learning. Web service is one of the understandable “language” for software developers. If the data scientists can provide a web service for the line of devs, they’ll be super excited because they only have to deal with JSON; not regression algorithms or neural networks! 😀

Azure ML studio provides you the power to deploy web services easily and nice interface that a software developer can understand. Consuming a web service built with Azure machine learning has become pretty easy because it even provide you the code samples and the sample JSONs that transfer in and out.

web-services

services.azureml.net

 

Recently AzureML Studio has come out with a new interface for managing the web services. Now it’s pretty easy for manage and monitor the behavior of your web services.

Go for your ML Studio. In web services section, you’ll find a new link directing to “New web services experience”. Currently it’s in the preview.

dashboard

New web services dashboard

 

Dashboard shows the performance of the web service that you built. The average execution time is shown there. Even you can get a glimpse on monetary terms attached with consuming the web service with the dashboard.

Testing the web services can be done through the new portal. If you want to build web application to consume the web service you built, can direct to the azure web app template that is pre-built for consuming ML web services.

Take a look from (http://services.azureml.net)  you’ll get used to it! 😀