Open Neural Network Exchange (ONNX)

In the current AI landscape, there are plenty of programming languages, frameworks, runtime environments and hardware devices used by practitioners for developing and deploying their machine learning and deep learning models. This technology stack get widen when it comes for integrating these machine learning models into software development processes.

With the experience with software development, we know handling platform dependencies and getting all components work smoothly is one of the biggest headache developers face. There’s no big difference in the machine learning space.

Addressing the problem of communicating between different machine learning development frameworks, industry is now adapting to “Open Neural Network Exchange” (ONNX).

What is ONNX?

ONNX acts as the open standard for representing ML/DL models

ONNX is an open format to represent both deep learning and tradition machine learning models. It increases the interoperability of the models without depending on the runtime environment or the development tools.

In simple words, you can port your neural network in a deep learning framework like Pytorch and then inference it on a Tensorflow environment by converting it into a ONNX model!

ONNX is widely supported by most of the frameworks, tools and hardware (Since it’s evolving rapidly, am pretty sure many frameworks will come under ONNX in the near future.)

Since ONNX is backed by the big players in AI space such as Facebook, Microsoft, AWS and Google you are use your familiar frameworks easily with ONNX.

Why ONNX?

Let’s get a scenario where you have built a deep learning based classification model for classifying grocery items using PyTorch as your deep learning framework. In a later stage of the developments you need to use the built model on a iOS mobile application where machine learning based operations are based on CoreML. You can export the PyTorch model into a ONNX model and then use on CoreML runtime for inference.

ONNX has proven it’s success in the scenarios where we have to deploy deep learning based models on IoT devices with less computation power and has stated a noticeable performance increase in inference times.

With ONNX, you don’t need to package the various platform dependencies in the deploying target. You just need the ONNX runtime.

You can find out the ONNX supported list of tools and frameworks through this link.

In the coming posts, am going to discuss my experiences with setting up ONNX runtime and using it with my favourite deep learning framework, PyTorch!

Happy coding 🙂

Connecting Azure SQL server with Azure Machine Learning

Accessing data in different data sources is one of the main tasks in machine learning model development life cycle. Let’s discuss one of the most common data accessing scenarios.

Scenario :

We have to set of relational data points stored in a Azure SQL server to develop a machine learning model using Azure Machine Learning. Let’s see how to leverage data stored in an Azure SQL database in an Azure Machine Learning experiment.

The process contains three main steps.

  1. Set access permissions of Azure SQL database
  2. Connect Azure SQL database to an Azure ML datastore
  3. Register the data in datastore as an Azure ML dataset.

1. Set access permissions of Azure SQL database

Allow Azure services and resources to access this server

By default Azure SQL databases are protected with a firewall which limits outside access for data. Since we going to provide access for the traffic from Ips belongs to Azure resources and services, make sure you allow Azure services to access your SQL server.

2. Connect Azure SQL database to an Azure ML datastore

Azure ML datastores can be defined as the abstraction of data sources for the ML workspace or as the interconnection between the data resource and AzureML workspace.

Go to your Azure Machine Learning Studio (ml.azure.com) and click ‘New datastore’. Provide a datastore name and select ‘Azure SQL database’ as the datastore type. Make sure to authenticate the access with Azure SQL server’s user ID and the password.

Register a new datastore

3. Register the data in datastore as an Azure ML dataset.

AzureML supports two types of datasets (Take a look here to get an overview on the difference between those). Since we are dealing with a set of relational data, Tabular dataset is the option we have to use for creating the dataset.

Create dataset from datastore

Select ‘Create dataset’ from ‘Datasets’ tab on AML Studio and prmopt to ‘From datastore’ option.

Select the datastore we created in the previous step which establish the connection between AML workspace and the data source.

Provide the required SQL query to select the required data from SQL server. Make sure to validate the data before configuring the schema.

Preview dataset

All done! Now you have the access to the data in your Azure SQL database from AzureML workspace. You can easily refer this in your experiments.

Validate dataset

In the cases where your database is getting updated time to time, what you have to do is refreshing the dataset to fetch the newest data points specified by the SQL query.

How to Streamline Machine Learning/ Data Science Projects?

CRISP-DM (Image from wikipedia)

When it comes to designing, developing and implementing a project related to data mining/ machine learning or deep learning, it is always better to follow a framework for streamlining the project flow.

It is OK to adapt a software development framework such as scrum, or waterfall method to manage a ML related project but I feel like having more streamlined process which pays attention on data would be an advantage for the success of a such project.

To my understanding there can be two variations of ML related projects.

  1. Solely machine learning/ data science based projects
  2. Software development projects where ML related services are a sub component of the main project.

The step-by step process am explaining can be used in both of these variations with your own additions and modifications.

Basically this is what I do when I get a ML related project to my hand.

I follow the steps of a good old standard process known as Cross-industry standard process for data mining (CRISP-DM) to streamline the project flow. Let’s go step by by step.

Step 1 : Business understanding

First you have to identify what is the problem you going to address with the project. Then you have to be open minded and answer the following questions.

  1. What is the current situation of this project? (whether it is using some conventional algorithm to solve this problem etc. )
  2. Do we really need to use machine learning to solve this problem? ( Using ML or deep learning for solving some problems maybe over engineering. Take a look whether it is essential to use ML to do the project.)
  3. What is the benefit of implementing the project? (ML projects are quite expensive and resource hungry. Make sure you get the sufficient RoI with the implementation.)
  4. What are constraints, limitations and risks? ( It’s always better to do a risk assessment prior the project. The data you have to use may have compliance issues. Look on those aspects for sure!)
  5. What tools and techniques am going to use? ( It maybe bit hard to determine the full tech stack you going to use before dipping your feet into the project. But good have even a rough idea on the tools, platforms and services you going to use to development and implementation. DON’T forget implementation phase. You may end up having a pretty cool development which maybe hard to implement with the desired application. So make sure you know your tool-set first)

Tip : If you feel like you are not having experience with this phase, never hesitate to discuss about it with the peers and experts in the field. They may come-up with easy shortcuts and techniques to make your project a success.

Step 2 : Data understanding

Data is the most vital part of any data science/ ML related project. When it comes to understanding the data, I prefer answering these questions.

  1. How big/small the data is? (Sometimes training deep learning models may need a lot of annotated data which is hard to find)
  2. How credible/ accurate the data is?
  3. What is the distribution of data?
  4. What are the key attributes and what are not-so-important attributes in data?
  5. How the data has been stored? (Data comes in CSVs/JSONs or flat files etc.)
  6. Simple statistical analysis of data?

Before digging into the main problem, you can save a lot of time by taking a closer look on data that you have or that you going to get.

Step 3 : Data preparation

To be honest, this step takes 80% of total project time most of the times. Data that we find in real world are not clean or in the perfect shape. Perfectly cleaned and per-processed data will save a lot of time in later stages. Make sure you follow the correct methodologies for data cleansing. This step may include tasks such as writing dataloaders for your data. Make sure to document the data preparation steps you did to the original dataset. Otherwise you may get confused in later stages.

Step 4 : Modelling

This is the step where you actually get the use of machine learning algorithms and related approaches. What I normally do is accessing the data and try some simple modelling techniques to interpret the data I have. For an example, will say I have a set of images to be classified using a artificial neural network based classifier… I’d first use a simple neural network with one or two hidden layers and see if the problem formation and modelling strategy is making any sense. If that’s successful, I’ll move for more complex approaches.

Tip : NEVER forget documentation! Your project may grow exponentially with thousands of code lines and you may try hundreds of modelling techniques to get the best accuracy. So that keep clear documentation on what you did to make sure you can roll back and see what you have done before.

Step 5 : Evaluation

Evaluating the models we developed is essential to determine whether we have done the right thing. Same as software review processes I prefer having a set framework to evaluate the ML projects. Make sure to select appropriate evaluation matrix. Some may not indicate the real behaviour of the models you build.

When performing a ML model evaluation, I plan ahead and make a set structure for the evaluation report. It makes the process easy to compare it against different parameter changes of the single model.

In most of the cases, we neglect the execution or the inference time when evaluating ML models. These can be vital factors in some applications. So that plan your evaluation wisely.

Step 6 : Deployment & Maintenance

Deployment is everything! If the deployment fails in the production, there’s no value in all the model development workload you did.

You should select the technologies and approaches to deliver the ML services (as REST web services, Kubernetes, container instances etc. ). I personally prefer containerising since it’s neat and clean. The deployed models should be monitored regularly. Predictions can get deviated with time. Sometimes data distribution can be changed. Make sure you create a robust monitoring plan beforehand.

Tip : What about the health of the published web endpoints or the capacity of inference clusters you using?? Yp! Make sure you monitor the infrastructure too.

https://docs.microsoft.com/en-us/azure/machine-learning/team-data-science-process/overview

This is just a high-level guideline that you can follow for streamlining data science/machine learning related tasks. This is a iterative process. There’s no hard bound rules saying you MUST follow these steps. Microsoft has introduced team data science process (TDSP) adapting and improving this concept with their own tool-sets.

Key takeaway : Please don’t follow cowboy coding for machine learning/ data science projects! Having a streamline process is always better! 🙂

Different Computation Options on Azure Machine Learning

In a later article we discussed on different data storage methods we can use with Azure Machine learning. In this article am gonna briefly discuss different computation options we have with Azure ML.

Since computation power is one of the key advantage we get from cloud based machine learning, choosing correct computation resource for our machine learning experiments is important.

AzureML offers 4 main compute types.

01. Compute instances –

If you don’t wanna spend the time in setting up your local computer for doing the ML experiments or you wanna leverage GPUs or powerful CPUs for doing your experiments, Azure Compute instances offer fully managed virtual machines loaded with most of the essential frameworks /libraries for performing machine learning and data science experiments. When you using AzureML notebooks (jupyter notebook instance attached for AzureML), compute instance is the place where the jupyter notebook is running.

Different methods can be used to access compute instances

You can access the compute instances using different methods. Accessing through Jupyter notebooks and JupyterLab is the all time favourite of most of the data scientists. If you are a R folk, you can use Rstudio with the compute instances. Accessing the compute instance through SSH is really useful (you may have to enable SSH access when creating the compute instance) in occasions where you have to install custom packages and such for the compute instance. (The machine is ubuntu based and you can use all bash scripts there!)

Basically compute instance can be defined as a virtual machine fully loaded with data science and machine learning essentials which you can use right out of the box.

02. Compute clusters –

Compute clusters are different from compute instances with their ability of having one or more compute nodes. These compute nodes can be created with our desired hardware configurations.

Why having more than one node? That comes with the ability of using parallel processing for computations. If you are going do to hyperparameter tuning/ GPU based complex computations/ several machine learning runs at once you may have to create a compute cluster.

If you are running Automated Machine Learning expriment with AzureML, you must have a compute cluster to perform computations.

When selecting the node configurations, you can either go with CPU based nodes or GPU based nodes. GPU based nodes (NC type etc.) is bit pricy. If you are not using GPU based computing, don’t waste your dollars by just creating a compute cluster with some fancy configs.

One other key setting is ‘Virtual machine priority’. If you are ok with pushing your experiment to the cloud and get the result without a hurry, you can go with low priority nodes, which will save you a lot of dollars rather than using dedicated VMs. No harm is gonna happen for the experimentation accuracy and such.

03. Inference clusters –

There are two options to deploy Azure machine learning web services as REST endpoints. 1) Use ACI (Azure Container Instances) 2) Use AKS (Azure Kubernetes Service)

Deploying the REST web service on ACI is good for testing and development uses and AKS would be the to-go for production level large deployments. You can configure the AKS cluster according to your need through AzureML as well as from the Azure portal. These AKS clusters are pretty much similar for AKS clusters you worked in any other Azure based deployments.

04. Attached compute –

Azure machine learning is not limited for doing computations on compute clusters. You can attach Azure Databricks, Data Lake Analytics, HDInsight or a prevailing VM as a compute for your workspace. Keep in your mind that Azure machine learning only supports virtual machines running Ubuntu. These compute targets will not be managed by Azure Machine Learning itself. So you may have to perform some additional steps to make sure they are compatible with your experiments.

Choosing the correct compute resource is a key component in the success of developing machine learning experiments. On the other hand, bad computation choices may leave you with huge Azure bills! 😀

There’s no hard bound rules on selecting different compute options for your machine learning life cycle. Just make sure you use the right tool at the right time.

Why we need GPUs for Deep Learning?

“If you wanna do deep learning experiments you should have a GPU”

This is one of the most common statements you should have heard over the years even from me. Do we really need a GPU for performing these experiments? If so why is that? So let’s dig in..

As we all know deep learning is all about deep neural networks. Training a deep neural network (DNN) is one of the most computationally expensive tasks in computer science. Since the DNN contains huge of numerical representations as the inputs as well as weights and biases inside the network, it can be mathematically seen as a set of multidimensional matrices.

A simplified illustration of an ANN

Now we have a set of huge multidimensional matrices.. next? In order to train a neural network to perform a particular task (will say classifying a bunch of images), we commonly use backpropagation algorithm which adjust the weights and biases of DNN according to the train set we given. In forward pass of training, input is passed through the neural network and after processing the input, an output is generated. Whereas in backward pass, we update the weights of neural network on the basis of error we get in forward pass. This operation includes huge set of matrix calculations (multiplications and summations). If you consider one single mathematical operation happens inside these passes, it’s pretty simple as multiplying two numbers. But when comes to a DNN such as VGG16 which contains 16 hidden layers (VGG16 is a convolutional neural network mostly specifically designed for computer vision tasks), it contains ~140 million parameters; aka weights and biases! So the number of calculations to perform for adjusting these weights and biases are tremendous!

Alright.. we now have millions and millions of calculations to be done.. So as we have very fast computers! What’s the issue? There’s no issue to be real. The problem is with the definition of speediness we have seen and using for our daily computations. Central Processing Unit (CPU) is the key component that contributes for computations in our computers. Typically a modern high end CPU is having 4-8 physical cores inside it. ( Intel Core i7 10th gen processor is having 8 physical cores) If you remember your early computer science lessons, a CPU can do only one task at a time. So to perform that millions and millions of calculations it’s gonna take ages! (Literally it may take years to train a complex DNN!) You may think of having a cluster with thousands of CPUs in parallel to do the task. Yes. It’s possible but it’s going to cost in millions plus going to consume a huge amount of power.

Since the single computational elements to be done is not complex in DNN training, Graphical Processing Units (GPUs) which are having hundreds/thousands of simple cores is a best match for this computational task. (typically a Nvidia940MX GPU sits on your laptop has 384 CUDA cores)

The table below will give your a better idea on GPU vs CPU. Just watch the video that demonstrate the power of thousands of small processors demonstrated by Mythbusters.

Source : https://www.slideshare.net/AlessioVillardita/ca-1st-presentation-final-published

Though we have a hype about GPUs today, usage of general purpose GPUs for scientific computing start in early 2000s. In 2006 Nvidia came out with CUDA language which allows to program GPUs using high level languages. Now all the deep learning frameworks have the ability to access the GPUs with just few lines of code. You can easily train a DNN model using the GPU you have on your laptop within minutes compared to training it on CPU which may take hours or days!

Alright.. alright…Now we need a GPU!

There are two main ways of leveraging the GPU power for your computations

1. Use a physical GPU fitted to your workstation/ laptop

This is quite straight forward. I would strongly recommend to use a Nvidia GPU since the cuDNN and CUDA packages are having great support plus good documentations for most of the deep learning libraries. AMD also have ROCm which is similar for CUDA based on openCL (https://rocmdocs.amd.com/en/latest/Current_Release_Notes/Current-Release-Notes.html) but I haven’t experienced it or used it. GPUs maybe bit expensive and plan ahead when you choosing a GPU for your workstation.

2. Using a GPU on cloud

This is a pretty cool. If you don’t have a GPU on your laptop but still you want to train the DNNs, you can easily spin up a GPU on a cloud and use it for your computations.

Almost all cloud providers (Google cloud platform, AWS, Microsoft Azure provides GPU based computing services) Some of these services are free for some extend and may have to pay if you going to run your training for a long time.

One of the main data science platforms, Kaggle provides free access for GPUs with a limit (https://www.kaggle.com/dansbecker/running-kaggle-kernels-with-a-gpu) Just take a look there and it may fit your need too.

So, that’s it! Don’t wait till the CPUs do the computation. Let the GPU train your model.

What’s Your Workhorse?

Going to break the flow of Azure Machine Learning blog series and write bit of a descriptive answer for a frequently asked question.

Here’s that golden question!

What is the development rig and tools you use for deep learning/machine learning development?

We all know doing machine learning experiments comes with its cost of computation power. When it comes to deep learning, it’s very computationally expensive. As a researcher who do most of my experiments on computer vision and deep learning this is the setup I use to my jobs done (As of September 2020).

Disclaimer :

Please note that, all the things am mentioning here are my personal preferences and not any standard or something. Neither any of the manufacturers or software vendors have sponsored me for this post 😀 (Most of the software tools I’m mentioning here are FOSS).

Hardware

Choosing the right set of hardware, is the most vital part of building a DL workstation. It maybe quite expensive to go for the best set which satisfy your need but I see that as an investment. Make sure you don’t spend way too much just for the name sake or the brand. Make sure whether it’s enough to do your job.

The main specifications you need to look on a DL workstation are processor, RAM, Storage and GPU processing unit (If you wanna do computer vision, NLP kinda experiments this is a must!) . I’m using a desktop with a Intel Xeon 4 core processor with a 16GB RAM. When it comes to storage I’m having a SSD as the primary drive installed all my software and a bit big (6TB) hard drive. Why such a big storage? Since I have to deal with somewhat big image and video datasets having a big storage is a need.

GPU! This is the pinnacle of your build! Plus maybe the most expensive. I have a NVIDIA GTX 1080 which is having 8GB GPU memory space and supports CUDA based processing. Comes handy with most of the experiments (When this is not enough, I use a remote cluster with 2 NVIDIA 2080Ti s for big computations.)

Make sue you are having proper cooling for the machine! Believe me else these computations gonna generate a lot of heat and ruin your hardware.

Laptop or a desktop?

This is a hard choice to make. If you have to go place to place to do your job then you may have to choose a laptop. (Many gaming laptops comes with GPUs which can be used for CUDA based processing). I’m more of a desktop person and you can easily build a more powerful rig for the same amount of dollars you spend for a good enough laptop. (I don’t prefer Macbooks since they are not coming with GPUs.. I just use a light laptop to do my presentations and just to carry around for meetings)

With all these beasts, a two-monitor setup, mechanical keyboard with a wireless mouse, RODE USB mic for video cons are records and a Bose sound cancelling headset for a complete isolation is just there to ease up the things.

Software and Tools

This is the most interesting part. Though you have the correct hardware, the wrong choice of tools would make your job hard. (Again this is completely my choice of software. You may have different preferences)

I use Ubuntu Linux as my operating system (18.10 LTS still 😀 ) . Why Ubuntu? Since it’s so easy to setup and work with, that’s my choice. Since Python is my primary programming language it works as magic with all the python environmental dependencies and all. (Plus OpenCV 😀 )

So I said Python.. what else? Yeah.. along with most of the machine learning frameworks (yes I have an Anaconda based setup) PyTorch has become my to-go deep learning framework. Easy debugging, Pythonic syntax, wide support made me to take this choice.

IDE

VSCode with installed extentions

No programmer is complete without his/her own IDE choice and modifications. Microsoft VSCode (which is open source and free) is the IDE I use. It supports Python, Spark, Scala and almost all the languages I use. I added few extensions which comes so handy with experiments. Here are some of the extentions I use. See if they suits for you.

  • Python – Make sure I get all the indentation and tooltips right.
  • Remote explorer – Since I use a remote cluster to train my models and sometimes a NAS to store, this comes handy to manage my SSH connections. Pretty convenient even for remote debugging.
  • Docker – pretty easy to manage your Docker environments.
  • Excel viewer – Just to view the CSV files in style.
  • LaTeX Workshop and Code Spell Checker – This is all because I use LaTeX for scientific writing. Believe me, VSCode is a nice place to do word processing too 😀

With all these tweaks I use the dark theme 😀

This is all about the hardware and software setup I use for my experiments. In a later post, will talk about some tips and tricks in MLOps I use within experiments.

AzureML Python SDK – Installation & Configuration

In the last blog post, we discussed on developing a machine learning classifier with Azure machine learning service. As mentioned there, we going to utilize the familiar development tools and frameworks for model development.

Key areas of the SDK include:

  • Explore, prepare and manage the lifecycle of your datasets used in machine learning experiments.
  • Manage cloud resources for monitoring, logging, and organizing your machine learning experiments.
  • Train models either locally or by using cloud resources, including GPU-accelerated model training.
  • Use automated machine learning, which accepts configuration parameters and training data. It automatically iterates through algorithms and hyperparameter settings to find the best model for running predictions.
  • Deploy web services to convert your trained models into RESTful services that can be consumed in any application.

~ Ref : https://docs.microsoft.com/en-us/python/api/overview/azure/ml/?view=azure-ml-py

AzureML python SDK acts as the connector between all the resources on the cloud and the dev environment.

Installing Python SDK –

AzureML SDK can be easily installed for your local computer through pip. Refer this guide for the installation process. I’d suggest to go with default installation since it’s enough for the most of the operations we used in the experiment.  It’s a good idea to upgrade the SDK before running an experiment since the package is rapidly updating.

Download config file –

For connecting the AzureML workspace, we may need the Azure subscription ID, resource group which the workspace has been created and the workspace name. The easiest way to grab these details is downloading the config.json file from the Azure portal. Place this file inside the experiment directory.

Downloading config.json from Azure portal

Connect AzureML Workspace –

Connecting the AzureML workspace and and listing the resources can be done by using easy python syntaxes of AzureML SDK (A sample code is provided below). Refer Python SDK documentation to do modifications for the resources of the AML service.

#!pip install --upgrade azureml-sdk[notebooks]
import azureml.core
from azureml.core import Workspace
from azureml.core import ComputeTarget, Datastore, Dataset

print("Ready to use Azure ML", azureml.core.VERSION)
ws = Workspace.from_config()
print(ws.name, "loaded")

#View resources in the workspace 
print("Compute Targets:")
for compute_name in ws.compute_targets:
    compute = ws.compute_targets[compute_name]
    print("\t", compute.name, ":", compute.type)
    
print("Datastores:")
for datastore in ws.datastores:
    print("\t", datastore)
    
print("Datasets:")
for dataset in ws.datasets:
    print("\t", dataset)

print("Web Services:")
for webservice_name in ws.webservices:
    webservice = ws.webservices[webservice_name]
    print("\t", webservice.name)

In next blog article, will discuss the data loading and experiment creation.

Build a Machine Learning Classifier with Azure Machine Learning Service

Azure Machine Learning Service is becoming the one-stop place for managing all ML related workloads in Azure cloud. There are two main advantages of using Azure Machine Learning Service for your ML and data science experiments.

#1 – You can mange the whole machine learning workflow in a single environment. From data wrangling to machine learning service deployment, everything is managed on the cloud with its reliable, scalable and efficient services.

#2 – You can use your familiar open source toolset, languages and frameworks in model development. Being a ML engineer or a data scientist, you may be using python or R as your main development languages. Azure machine learning allows you to use any of those languages and frameworks to develop the experiments.

Pima Indians Diabetes Classification is one of the most famous machine learning experiments. It’s a binary classification problem which use a .CSV based tabular dataset as the input. I’ll walk you through the process I went through to perform my experiment.

Scenario:

  • Diabetes dataset is available as a .CSV file in your local file system.
  • I have to build a binary classifier trained with the dataset and deploy it as a web service with a REST endpoint.

Solution:

As shown in the diagram I used the services and tools in AMLS with my typical development environments to build up the solution.

  • Step 1: Since the experiment is going to build on Azure cloud, I have transferred my dataset into an Azure blob storage. I used Azure storage Explorer to upload the dataset into the cloud. (For better performance, make sure the dataset is in a storage blob in the same region where the AMLS experiment is)
  • Step 2: In order to access the data stored in the blob space, it’s registered inside AMLS as a datastore.
  • Step 3: AMLS supports two types of datasets. Since the .CSV file contains tabular data, it’s registered as a tabular dataset. (You can perform the basic statistical operations and visualizations after registering as a tabular dataset.)
  • Step 4: Now it’s the time for the real job. Since am more familiar with Python and sci-kit learn, I used those languages and libraries to develop my model. The whole coding part has been done on a Linux machine using my favorite VSCode IDE. 😉 You may wonder how I’m going to connect the code base on my local machine with the cloud… Here’s the place where AzureML python SDK comes to the rescue.
  • Step 5: I don’t have enough computation power to do the model training on my machine. So that, I use an Azure compute cluster to perform the computation. (In my experiment I did hyperparameter tuning to select the best parameters. Using the compute cluster allowed me to perform parallel training)
  • Step 6: After model training and getting the desired inference accuracy, I had the need of exposing the binary classification model as a web service. For that, I used Azure Container Instance (ACI) since this is going to be a small testing experiment. (I may have to go for an Azure Kubernetes Services (AKS) if I wanna go for global massive deployment)

Yp! It’s just a simple 6 step process. Complex? Don’t worry, I’m going to walk you through the whole process assisted with the code snippets in the upcoming blog posts. Stay tuned. Let’s start a real experiment with Azure Machine Learning Service.

Zero-code Predictive Model Development with AutomatedML on Azure Machine Learning

Designing and implementing predictive experiments requires an understanding about the problem domain as well as the knowledge on machine learning algorithms and methodologies. Extensive knowledge on programming is a necessity when it comes to real-world machine learning model training and implementations.

Automated machine learning is capable of training and tuning a machine learning model for a given dataset and specified target metrics by selecting the appropriate algorithms and parameters by its own.  Azure Machine Learning offers a user-friendly wizard-like Automated ML feature for training and implementing predictive models without giving you the burden of algorithm and hyperparameter selection.

Azure Automated ML comes handy, where you are able to implement a complete machine learning pipeline without a single line of coding. It saves a time and compute resources since the model tuning is done by following data science best practices.

Azure machine learning currently supports three types of machine learning user cases in their AutomatedML pipeline.

1. Classification – To predict one of several categories in the target column

2. Time series forecasting – To predict values based on time

3. Regression – To predict continuous numeric values

Let’s go through the step by step process of developing a machine learning experiment pipeline with Azure Automated ML.

01. CREATE AN AZURE MACHINE LEARNING WORKSPACE

Azure Machine Learning Workspace is the resource you create on Azure to perform all machine learning related activities on the cloud. The steps are straight forward same as creating any other Azure resource. Make sure you create the Workspace edition is ‘Enterprise’ since AutoML is not available in the basic edition.

02. CREATE AUTOMATEDML EXPERIMENT

Create AutomatedML experiment

ml.azure.com web interface is the one stop portal for accessing all the tools and services related to machine learning on Azure. You have to create a new Automated ML run by selecting Automated ML on the Author section of the left pane.

03. SELECT DATASET

Select dataset from the source

As of now, AutomatedML supports tabular data formats only. You can upload your dataset from the local storage, import from a registered datastore, fetch from a web file or else retrieve from Azure open datasets.

04. CONFIGURE RUN

Configuring the Automated ML run

In this section you have to specify the target column of the experiment. If it’s a classification task this should be the column that indicates the class values and if it’s regression that’s the column where the numerical value to be predicted. Select a training cluster where the experiments going to run. Make sure you select a cluster that is enough for the complexity of the dataset you provided.

05. SELECT TASK TYPE AND SETTINGS

Select the task type

Select the task type that is appropriate for the dataset you selected. If you have textual data in your dataset you can enable deep learning (which is in preview) to extract the features.

In the settings of the run, you can specify the evaluation metrics, any algorithms that you are don’t want to use, validation type, exit criterion etc. for the experiment. If you wish to select only a specific set of features in the provided dataset you can configure that through the settings.

Configuring the evaluation metrics, algorithms to block, validation type, exit criterion

Running the experiment may take some time depending on the complexity of the dataset, algorithms you use and the exit criterion you used.

When the run is completed AzureML provides a summary of the run by indicating the best performing algorithm. You can directly deploy or download the best performing model as a .pkl file from the portal.

Details of the run after the completion

Deployment comes as a REST API which runs on an Azure Kubernetes Service (AKS) or Azure Container Instance (ACI).

AutomatedML comes handy when you need to do fast prototyping for a specific set of data and supports the agile process of intelligent application development. Will look on the other tools and features we have on Azure AI stack in the coming articles.

Reference : https://docs.microsoft.com/en-us/azure/machine-learning/concept-automated-ml

Microsoft Build 2020 : AI Announcements

The global pandemic situation has changed the world’s view on technological interventions and innovations. Carrying their corporate mission of “Empowering every person and every organization on the planet to achieve more”, the big tech giant Microsoft hold their annual developer conference “Build 2020” as a 48-hour virtual event.

Being virtual didn’t get down the excitement the conference creates among the developer community as well as within the enterprises. I blogged about few exciting announcements Microsoft did on Build 2020 related to AI on Kodez blog. Let’s discuss some on the interesting use cases of these announcements in future posts.

Read my post on Kodez here.