The Story of Deep Pan Pizza :AI Explained for Dummies

Artificial Intelligence, Machine Learning, Neural Networks, Deep Learning….

Most probably, the words on the top are the widely used and widely discussed buzz words today. Even the big companies use them to make their products appear more futuristic and “market candy” (Like a ‘tech giant’ recently introduced something called a ‘neural engine’)!

Though AI and related buzz words are so much popular, still there are some misconceptions with people on their definitions. One thing that clearly you should know is; AI, machine learning & deep learning is having a huge deviation from the field called “Big Data”. It’s true that some ML & DL experiments are using big data for training… but keep in mind that handling big data and doing operations with big data is a separate discipline.

So, what is Artificial Intelligence?

“Artificial intelligence, sometimes called machine intelligence, is intelligence demonstrated by machines, in contrast to the natural intelligence displayed by humans and other animals.” – Wikipedia

Simple as that. If a system has been developed to perform the tasks that need human intelligence such as visual perception, speech recognition, decision making… these systems can be defined as a intelligent system or an so called AI!

The most famous “Turing Test” developed by Alan Turing (Yes. The Enigma guy in the Imitation Game movie!) proposed a way to evaluate the intelligent behavior of an AI system.

Turing_test_diagram

Turing Test

There are two closed rooms… let’s say A & B. in the room A… we have a human while in the room B we have a system. The interrogator; person C is given the task to identify in which room the human is. C is limited to use written questions to make the determination. If C fails to do it- the computer in room A can be defined as an AI! Though this test is not so valid for the intelligent systems we have today, it gives a basic idea on what AI is.

Then Machine Learning?

Machine learning is a sub component of AI, that consists of methods and algorithms allows the computer systems to statistically learn the patterns of data. Isn’t that statistics? No. Machine learning doesn’t rely on rule based programming (It means that a If-Else ladder is not ML 😀 ) where statistical modeling is mostly about formulation of relationships between data in the form of mathematical equations.

There are many machine learning algorithms out there. SVMs, decision trees, unsupervised methods like K-mean clustering and so-called neural networks.

That’s ma boy! Artificial Neural Networks?

Inspired by the neural networks we all have inside our body; artificial neural network systems “learn” to perform tasks by considering many examples. Simply, we show a thousand images of cute cats to a ANN and next time.. when the ANN sees a cat he is gonna yell.. “Hey it seems like a cat!”.

If you wanna know all the math and magic behind that… just Google! Tons of resources there.

Alright… then Deep Learning?

Yes! That’s deep! Imagine the typical vanilla neural networks as thin crust pizza… It’s having the input layer (the crust), one or two hidden layers (the thinly soft part in the middle) and the output layer (the topping). When it comes to Deep Learning or the deep neural networks, that’s DEEP PAN PIZZA!

e8f6eaa267ef4b02b2734d0031767728_th

DNNs are just like Deep Pan Pizzas

Deep Neural Networks consist of many hidden layers between the input layer and the output layer. Not only typical propagation operations, but also some add-ins (like pineapple) in the middle. Pooling layers, activation functions…. MANY!

So, the CNNs… RNNs…

You can have many flavors in Deep Pan Pizzas! Some are good for spicy lovers… some are good for meat lovers. Same with Deep Neural Networks. Many good researchers have found interesting ways of connecting the hidden layers (or baking the yummy middle) of DNNs. Some of them are very good in image interpretation while others are good in predicting values that involves time or the state. Convolutional Neural Networks, Recurrent Neural Networks are most famous flavors of this deep pan pizzas!

These deep pan pizzas have proven that they are able to perform some tasks with close-to-human accuracy and even sometimes with a higher accuracy than humans!deep-learning

Don’t panic! Robots would not invade the world soon…

 

Image Courtesy : DataScienceCentral | Wikipedia

Advertisements

One-Hot Encoding in Practice

mtimFxhData is the king in machine learning. In the process of building machine learning models, data is used as the input features.

Input features comes in all shapes and sizes. For building a predictive model with a better accuracy rate, we should understand the data as well as the logic behind the algorithm we going to use to fit the model.

Data Understanding; as the second step of CRISP-DM, guides for understanding the types and the way the data we get has been represented. We can distinguish three main kinds of data feature.

  1. Quantitative Data           – Data with numerical scale (Age of a person in years, Price of a house in dollars etc.)
  2. Ordinal features              – Data without a scale but with ordering (Ordered sets/ first, second, third etc.)
  3. Categorical features       – Data without a numerical scale neither an ordering. These features don’t allow any statistical summary. (Car manufacturer categories, Civil status, N-grams in NLP etc.)

Most of the machine learning algorithms such as linear regression, logistic regression, neural network, support vector machine works better with numerical features.

Quantitative features come with a numerical value and they can be directly used (Sometimes data preprocessing, normalization may have to use) as the input features of ML algorithms.

Ordinal features can be easily represented in numbers (Ex. First = 1, Second = 2, Third = 3 …). This is called Integer Encoding. Representing ordinal features using numbers makes sense because the dependency between each representation can be notated in a numerical way.

There are some algorithms that can directly deal with joint discrete distribution such as Markov chain / Naive Bayes / Bayesian network, tree based, etc. These algorithms can work with categorical data without any encoding; while we should encode the categorical features in a way to represent in a numerically to use as the input features for other ML algorithms. That means it’s better to change the categorical features to numerical most of the times 😊

There are some special cases too. For an example, while naïve bias classification only really handles categorical features, many geometric models go in the other direction by only handling quantitative features.

How to convert Categorical data for Numerical data?

There are few ways to covert the categorical data to numerical data.

  • Dummy encoding
  • One-hot encoding / one-of-K scheme

are the most prominent ways of it.

One hot encoding is the process of converting the categorical features into numerical by performing “binarization” of the category and include it as a feature to train the model.

In mathematics, we can define one-hot encoding as…

One hot encoding transforms:

a single variable with n observations and d distinct values,

to

d binary variables with n observations each. Each observation indicating the presence (1) or absence (0) of the dth binary variable.

Let’s get this clear with an example. Suppose you have ‘flower’ feature which can take values ‘daffodil’, ‘lily’, and ‘rose’. One hot encoding converts ‘flower’ feature to three features, ‘is_daffodil’, ‘is_lily’, and ‘is_rose’ which all are binary.

CaptureA common application of OHE is in Natural Language Processing (NLP). It can be used to turn words to vectors so easily. Here comes a con of OHE, where the vector size might get very large with respect to the number of distinct values in the feature column.If there’s only two distinct categories in the feature, no need to construct to additional columns. You can just replace the feature column with one Boolean column.

oJEie

OHE in word vector representation

You can easily perform One-hot encoding in AzureML Studio by using the ‘Convert to Indicator Values’ module. The purpose of this module is to convert columns that contain categorical values into a series of binary indicator columns that can more easily be used as features in a machine learning model, which is the same happens in OHE. Let’s look at performing One-Hot encoding using python in next article.

Mission Plan for building a Predictive model

maxresdefaultWhen it comes to a machine learning or data science related problem, the most difficult part would be finding out the best approach to cope up with the task. Simply to get the idea of where to start!

Cross-industry standard process for data mining, commonly known by its acronym CRISP-DM, is a data mining process model describes commonly used approaches that data mining experts use to tackle problems. This process can be easily adopted for developing machine learning based predictive models as well.

CRISP-DM_Process_Diagram

CRISP – DM

No matter what are the tools/IDEs/languages you use for the process. You can adopt your tools according to the requirement you’ve.

Let’s walk through each step of the CRISP-DM model to see how it can be adopted for building machine learning models.

Business Understanding –

This is the step you may need the technical knowhow as well as a little bit of knowledge about the problem domain. You should have a clear idea on what you going to build and what would be the functional value of the prediction you suppose to do through the model. You can use Decision Model & Notation (https://en.wikipedia.org/wiki/Decision_Model_and_Notation) to describe the business need of the predictive model. Sometimes, the business need you are having might be able to solve using simple statistics other than going for a machine learning model.

Identifying the data sources is a task you should do in this step. Should check whether the data sources are reliable, legal and ethical to use in your application.

Data Understanding –

I would suggest you to do the following steps to get to know your data better.

  1. Data Definition – A detailed description on each data field in the data source. The notations of the data points, the units that the data points have been measured would be the cases you should consider about.
  2. Data Visualization – Hundreds or thousands of numerical data points may not give a clear idea for you what the data is about or an idea about the shape of your data. You may able to find interesting subsets of your data after visualizing it. It’s really easy to see the clustering patterns or the trending nature of the data in a visualized plot.
  3. Statistical analysis – Starting from the simple statistical calculations such as mean, median; you can calculate the correlation between each data field and it will help you to get a good idea about the data distribution. Feature engineering to increase the accuracy of the machine learning model. For performing that a descriptive statistical analysis would be a great asset.

For data understanding, The Interactive Data Exploration, Analysis and Reporting tool (IDEAR) can be used without getting the hassle of doing all the coding from the beginning. (Will discuss on IDEAR in a long run soon)

Data Preparation –

Data preparation would take roughly 80% of your time of the process implying it’s the most vital part in building predictive models.

This is the phase where you convert the raw data that you got from the data sources for the final datasets that you use for building the ML models. Most of the data you got from raw sources like IoT sensors or collectives are filled with outliers, contains missing values and disruptions. In the phase of data preparation, you should follow data preprocessing tasks to make those data fields usable in modeling.

Modeling –

Modeling is the part where algorithms comes to the scene. You can train and fit your data to a particular predictive model to perform the deserved prediction. You may need to check the math behind the algorithms sometimes to select the best algorithm that won’t overfit or underfit the model.

Different modeling methods may need data in different forms. So, you may need to revert back for the data preparation phase.

Evaluation –

Evaluation is a must before deploying a model. The objective of evaluating the model is to see whether the predictive model is meeting the business objectives that we’ve figured out in the beginning. The evaluation can be done with many parameter measures such as accuracy, AUC etc.

Evaluation may lead you to adjust the parameters of the model and might have to choose another algorithm that performs better. Don’t expect the machine learning model to be 100% accurate. If it is 100% most probably it would be an over fitted case.

Deployment –

Deployment of the machine learning model is the phase where the client, or the end user going to consume. In most of the cases, the predictive model would be a part of an intelligent application that acts as a service that gets a set of information and give a prediction as an output of that.

I would suggest you to deploy the module as a single component, so that it’s easy to scale as well as to maintain. APIs / Docker environments are some cool technologies that you can adopt for deploying machine learning models.

CRISP-DM won’t do all the magic of getting a perfect model as the output though it would definitely help you not to end up in a dead-end.

Artificial Neural Networks with Net# in Azure ML Studio

The ideas for neural networks go back to the 1940s. The essential concept is that a network of artificial neurons built out of interconnected threshold switches can learn to recognize patterns in the same way that an animal brain and nervous system does.

Though the name “neural network” gives an idea of a ‘black box’ type predictive operation; ANN is a set of mathematical operations.

VqOpE

As the name implies by itself; neural network is a structural ‘network’. The nodes of the neural network are organized in layers and the nodes are connected with each other with edges. The edges are directional and they are weighted.

Azure Machine Learning Studio comes with pre-built neural network modules that can easily use for predictive analytics.

NN models

Pre-built neural networks in AML Studio  

Multiclass Neural Network Module –

Used for multiclass classification problems. The number of hidden nodes, the learning date, number of learning iterations and many parameters can be changed easily by changing the module properties.

Two-Class Neural Network –

Ideal for binary classification problems. Same as the Multiclass neural network module, the properties of the neural network can be changed by the module properties.

Neural Network regression –

This is a supervised machine learning method that can be used to predict a numerical value.

These simple pre-built modules can be added to your ML experiment with just a drag and drop and change the parameters by changing the module properties. What you going to do if you want to implement a complex neural network architecture? Or to create a deep neural network with more hidden layers?

AzureML Studio comes handy here with providing you the ability to define the hidden layer/layers of the ANN with a script. Net# scripting language provide the ability to define almost any neural network architecture in an easy to read format.

Net# scripting language is able to

  • Create hidden layers and control the number of nodes in each layer.
  • Specify how layers are to be connected to each other.
  • Define special connectivity structures, such as convolutions and weight sharing bundles.
  • Specify different activation functions.

In Azure Machine Learning, you can add the Net# scripts by choosing ‘Custom definition script’ in Hidden layer specification property. By default, it would set to the fully connected case.

properties

Net# lexical is more similar to C#. The structure of a Net# script has four main sections.

  1. Constant declaration (Optional) – Define values used elsewhere in the neural network definition
  2. Layer declaration – The input, hidden and output layers are defined with the layer dimensions. The layer declaration for hidden or output layer can include the output function.
  3. Connection declaration – You can define connection bundles (Full, Filtered, Convolutional, Pooling, Response normalization) – Full connection bundle is the default configuration.
  4. Share declaration (Optional) – Defining multiple bundles with shared weights.

This is a simple neural network defined by a Net# script to perform a binary classification. You can customize the number of hidden neurons and the activation functions and see how the accuracy of the model variate.

<!– HTML generated using hilite.me –>

//A simple neural network definition
//auto keyword allows the ANN to automatically include all feature columns in the input examples
//input layer named Data
input Data auto;

//Hidden layer named "H" including 200 nodes
hidden H [200] from Data all;

//output layer named "Out" including 2 nodes (binary classification problem) 
//Sigmoid activation function has been used.
output Out [2] sigmoid from H all;

For more insides here’s the resources – https://docs.microsoft.com/en-us/azure/machine-learning/studio/azure-ml-netsharp-reference-guide#overview

Evaluating AzureML Experiments

Azure Machine Learning Studio allows you to build and deploy predictive machine learning experiments easily with few drags and drops (technically 😉).

The performance of the machine learning models can be evaluated based on number of matrices that are commonly used in machine learning and statistics available through the studio. Evaluation of the supervised machine learning problems such as regression, binary classification and multi-class classification can be done in two ways.

  1. Train-test split evaluation
  2. Cross validation

Train-test evaluation –

In AzureML Studio you can perform train-test evaluation with a simple experiment setup. The ‘Score Model’ module make the predictions for a portion of the original dataset. Normally the dataset is divided into two parts and the majority is used for training while the rest used for testing the trained model.

train-test

Train-test split

You can use ‘Split Data’ module to split the data. Choose whether you want a randomized split or not. In most of the cases, randomized split works better. If the dataset is having a periodic distribution for an example a time series data, NEVER use randomized split. Use the regular split.

Stratified split allows you to split the dataset according to the values in the key column. This would make the testing set more unbiased.

  • Pros-
    • Easy to implement and interpret
    • Less time consuming in execution
  • Cons-
    • If the dataset is small, keeping a portion for testing would be decrease the accuracy of the predictive model.
    • If the split is not random, the output of the evaluation matrices are inaccurate.
    • Can cause over-fitted predictive models.

Cross Validation –

Overcome the mentioned pitfalls in train-test split evaluation, cross validation comes handy in evaluating machine learning methods. In cross validation, despite of using a portion of the dataset for generating evaluation matrices, the whole dataset is used to calculate the accuracy of the model.

K-fold_cross_validation_EN

k-fold cross validation

We split our data into k subsets, and train on k-1 of those subsets. What we do is holding the last subset for test. We’re able to do it for each of the subsets. This is called k-folds cross validation.

  • Pros –
    • More realistic evaluation matrices can be generated.
    • Reduce the risk of over-fitting models.
  • Cons –
    • May take more time in evaluation because more calculations to be done.

Cross-validation with a parameter sweep –

I would say using ‘Tune model Hyperparameters’ module is the easiest way to identify the best predictive model and then use ‘Cross validate Model’ to check its reliability.

Here in my sample experiment I’ve used the breast cancer dataset available in AzureML Studio that normally use for binary classification.

experimentThe dataset consists 683 rows. I used train-test split evaluation as well as cross validation to generate the evaluation matrices. Note that whole dataset has been used to train the model in cross validation case, while train-test split only use 70% of the dataset for training the predictive model.

Two-class neural networks has used as the binary classification algorithm. The parameters are swapped to get the optimal predictive model.

When observing the outputs, the cross-validation evaluation provides that model trained with whole dataset give a mean accuracy of 0.9736 while the train-test evaluation provides an accuracy of 0.985! So, is that mean training with less data has increased the accuracy? Hell no! The evaluation done with cross-validation provides more realistic matrices for the trained model by testing the model with maximum number of data points.

Take-away – Always try to use cross-validation for evaluating predictive models rather than going for a simple train-test split.

You can access the experiment in the Cortana Intelligence Gallery through this link –

https://gallery.cortanaintelligence.com/Experiment/Breast-Cancer-data-Cross-Validation

Copying & Migrating AzureML experiments

A set Major advantages in using cloud based machine learning platforms are the ability of collaborative projects, easy sharing and easy migration.  Within AzureML Studio you can share or migrate the experiments using various approaches.

01. Share AzureML workspace

If you want to share all the experiments in your workspace with another user, this is the best option you can go with. All your built experiments, trained models, datasets would be shared with the users with this permission.

  1. Click SETTINGS in the left pane
  2. Click the USERS tab
  3. Click INVITE MORE USERS at the bottom of the page

ml4The users you inviting should have a Microsoft account or a work/school account from Azure Active Directory. Two user access levels can be defined as “Users” and “Owners”.

02. Copy experiment to an AzureML workspace

If you want to migrate an experiment from the current workspace to another, you can go for the experiments pane and click “Copy to workspace”. Note that you only can copy experiments to the workspaces in the same Azure region. This is important if you want to move your experiment from a free tier workspace to a paid standard tier.

ml6You’ll not be able to copy multiple experiments using a single click. If you have such kind of scenario, use poweshell scripts as instructed in this descriptive post.

03. Publish to Gallery

ml7For me this is one of the most useful options. You can use this option in two ways. One is to make the experiments public and in a way that only accessible through a shared link. If you share the experiment publicly that will be listed in the Cortana Intelligence Gallery.

ml8If you want to share an experiment only with your peer group, publishing as an ‘unlisted’ experiment is the best way. Users can open the experiment in their own AzureML studio. This option can be used to migrate your experiment within different workspaces as well as between different azure regions. Only the users who’s having the link you shared can only view or use the experiment you shared.

Time Series Forecasting with Azure ML

airline1_web-0When we have a series of data points indexed in time order we can define that as a “Time Series”. Most commonly, a time series is a sequence taken at successive equally spaced points in time. Monthly rainfall data, temperature data of a certain place are some examples for time series.

In the field of predictive analytics, there are many incidents that need to analyze time series data and forecast the future values of that based on the previous values. Think of a scenario where you’ve to do a time series prediction for your business data or an incident where part of your predictive experiment contains a time series field that need to predict the future data points… There are many algorithms and machine learning models that you can use for forecasting time series values.

Multi-layer perception, Bayesian neural networks, radial basis functions, generalized regression neural networks (also called kernel regression), K-nearest neighbor regression, CART regression trees, support vector regression, and Gaussian processes are some machine learning algorithms that can be used for time series forecasting.

See here for more about these methods

Autoregressive Moving Average (ARIMA), Seasonal-ARIMA, Exponential smoothing (ETS) are some algorithms that widely used for this kind of time series analysis. I’m not going to dig deep into the algorithms, trend analysis and all numbers & characteristics bound with time series. Just going to demonstrate a simple way that you can do time series analysis in your deployments using Azure ML Studio.

After adding a dataset that contains a time series data into AzureML Studio, you can perform the time series analysis and predictions by using python or R scripts. In addition to that ML Studio offers a pre-built module for Anomaly detection of time series datasets. It can learn the normal characteristics of the provided time series and detect deviations from the normal pattern.

Here I’ve used forecast R package to write code snippets enabling AzureML Studio to do TS forecasting using popular time series algorithms namely as ARIMA, Seasonal ARIMA and ETS.

ARIMA seasonal & ARIMA non-seasonal

#ARIMA Seasonal / ARIMA non-seasonal 
library(forecast)
# Map 1-based optional input ports to variables
dataset1 <- maml.mapInputPort(1) # class: data.frame
dataset2 <- maml.mapInputPort(2) # class: data.frame

#Enter the seasonality of the timeseries here
#For non-seasonal model use '1' as the seasonality
seasonality<-12
labels <- as.numeric(dataset1$data)
timeseries <- ts(labels,frequency=seasonality)
model <- auto.arima(timeseries)
numPeriodsToForecast <- ceiling(max(dataset2$date)) - ceiling(max(dataset1$date))
numPeriodsToForecast <- max(numPeriodsToForecast, 0)
forecastedData <- forecast(model, h=numPeriodsToForecast)
forecastedData <- as.numeric(forecastedData$mean)

output <- data.frame(date=dataset2$date,forecast=forecastedData)
data.set <- output

# Select data.frame to be sent to the output Dataset port
maml.mapOutputPort("data.set");

 

ETS seasonal & ETS non-seasonal

#ETS seasonal / ETS non-seasonal 
library(forecast)
# Map 1-based optional input ports to variables
dataset1 <- maml.mapInputPort(1) # class: data.frame
dataset2 <- maml.mapInputPort(2) # class: data.frame

#Add the seasonality here
#Assign seasonality as 'a' for non-seasonal ETS  
seasonality<-12
labels <- as.numeric(dataset1$data)
timeseries <- ts(labels,frequency=seasonality)
model <- ets(timeseries)
numPeriodsToForecast <- ceiling(max(dataset2$date)) - ceiling(max(dataset1$date))
numPeriodsToForecast <- max(numPeriodsToForecast, 0)
forecastedData <- forecast(model, h=numPeriodsToForecast)
forecastedData <- as.numeric(forecastedData$mean)

output <- data.frame(date=dataset2$date,forecast=forecastedData)
data.set <- output

# Select data.frame to be sent to the output Dataset port
maml.mapOutputPort("data.set");

 

The advantage of using R script for the prediction is the ability of customizing the script as you want. But if you want looking for an instant solution for doing time series prediction, there’s a custom module in Cortana Intelligence gallery to do time series forecasting.

https://gallery.cortanaintelligence.com/Experiment/Time-Series-Forecasting-using-Custom-Modules-1

You just have to open that in your studio and re-use the built modules in your experiment. See what’s happening to your sales in next December! 🙂