Simple Linear Regression with Azure ML + Python

1419973816879Simple linear regression is a statistical method that allows us to summarize and study relationships between two continuous (quantitative) variables: One variable, denoted x, is regarded as the predictor, explanatory, or independent variable. The other variable, denoted y, is regarded as the response, outcome, or dependent variable.

Typically when we doing regression analysis, we consider about the correlation of coefficient of the input variables. Correlation analysis measures the extent to which two variables vary together, including the strength and direction of their relationship.

correlation_dot_graphsLinear correlation coefficient(also called Pearson product-moment correlation coefficient) measure of the strength and direction of a linear association between two random variables.

I used the Istanbul Stock Exchange dataset to demonstrate the steps in doing a simple linear regression prediction. Azure Machine Learning experiment has built (get the experiment from here) for building the regression model. Built-in Bayesian Linear Regression algorithm has been used for building the model.

capture1The most interesting part is coming with python! 🙂

I’ve used a Jupyter Notebook and fetched the data to that workspace to visualize the dataset and to calculate the coefficient values between each variable. Pearsonr method in scipy library has used for that.

Refer the iPython notebook from Azure Notebook for the complete python script and the visualizations.

https://notebooks.azure.com/library/Python%20Visualizations/html/Istanbul%20Stock%20Python%203%20notebook.ipynb

Do run the code by your own. You’ll get it for sure!

 

Natural Language Processing with Python + Visual Studio

cap_4Human Language is one of the most complicated phenomena to interpret for machines. Comparing to artificial languages like programming languages and mathematical notations, natural languages are hard to notate with explicit rules. Natural Language Processing, AKA Computational Linguistics enable computers to derive meaning from human or natural language input.

When it comes to natural language processing, text analysis plays a major role. One of the major problems we have to face when processing natural language is the computation power. Working with big corpus and chunking the textual data into n-grams need a big processing power. All mighty cloud; the ultimate savior comes handy in this application too.

Let’s peep into some of the cool tools you can use in your developments. In most of the cases, you don’t want to get the hassle of developing from the scratch. There are plenty of APIs and libraries that you can directly integrate with your system.

If you think, you wanna go from scratch and do some enhancements, there’s the space for you too. 😊

Text Analytics APIs

Microsoft text analytics APIs are set of web services built with Azure Machine Learning. Many major tasks found in natural language processing are exposed as web services through this. The API can be used to analyze unstructured text for tasks such as sentiment analysis, key phrase extraction, language detection and topic detection. No hard rules are training loads. Just call the API from your C# or python code. Refer the link below for more info.

https://docs.microsoft.com/en-us/azure/machine-learning/machine-learning-apps-text-analytics

Process natural language from the scratch!

Python! Yeah. that’s it! Among many languages used for programming, python comes handy with many pre-built packages specifically built for natural language processing.

Obviously, python works well with unix systems. But now the best IDE in town; Visual Studio comes with a toolset for python which enable you to edit, debug and compile python scripts using your existing IDE.  You should have Visual Studio 2015 (Community edition, professional or enterprise) for installing the python tools. (https://www.visualstudio.com/vs/python/)

Here I’ve used NLTK (Natural Language Tool Kit) for the task. One of the main advantage with NLTK is, it comes with dozens of built in corpora and trained models.

These are the Language processing tasks and corresponding NLTK modules with examples of functionality comes with that.

cap_1

Source – http://www.nltk.org/book/ch00.html

For running python NLTK for the first time you may need to download the nltk_data. Go for the python interactive console and install the required data from the popping up NLTK downloader. (Use nltk.download()  for this task)

cap_2

Here’s a little simulation of natural language processing tasks done using NLTK. Code snippets are commented for easy reading. 😊

import nltk
from nltk.corpus import treebank
from nltk.corpus import stopwords

#Sample sentence used for processing
sentence = """John came to office at eight o'clock on Monday morning & left the office with Arthur bit early."""

#Tokenizing the sentence into words 
word_tokens = nltk.word_tokenize(sentence)
#Tagging words
tagged_words = nltk.pos_tag(word_tokens)
#Identify named entities
named_entities = nltk.chunk.ne_chunk(tagged_words)

#Removing the stopwords from the text - Predefined stopwords in English have been used.
stop_words = set(stopwords.words('english'))
filtered_sentence = [w for w in word_tokens if not w in stop_words]

filtered_sentence = []

for w in word_tokens:
    if w not in stop_words:
        filtered_sentence.append(w)

print('Sentence - ' + sentence)
print('Word tokens - ')
print(word_tokens)
print('Tagged words - ')
print(tagged_words)
print('Named entities - ')
print(named_entities)
print('Word tokens - ')
print(word_tokens)
print('Filtered sentence - ')
print(filtered_sentence)

 

The output after executing the script should be like this.

cap_3You can Improve these basics to build Named Entity Recognizer and many more…

Try processing the language you read and speak… 😉

Jupyter Notebook on AzureML

plot_regression_3d_1 If you are fond of playing with data to dig out the relationships of it and to plot interesting visualizations with data; python is the language you should speak.

Over the years, with the strong community support, python language got dedicated libraries for data analysis and predictive modeling like scikit-learn, Tensorflow, Theano etc. Even the ultimate IDE in town; Visual Studio started supporting python! So, no hesitation. Python is a great choice to make.

You can use many IDEs or even a simple text editor to write your python files. But python comes with a handy web application; Jupyter notebook that can be used to do your code. Even compile it!

Jupyter gets its birth in 2014 as a spin-off project of IPython; which is a command shell for interactive computing in multiple programming languages, originally developed for the Python.

Why Jupyter?

Jupyter notebook is a very popular tool among data scientists which as a web application that allows you to create and share documents that contain live code, equations, visualizations and explanatory text. “Jupyter” is a loose acronym meaning Julia, Python and R. One of the most prominent uses you get when using Jupyter notebook is the ability of sharing the data transformation and visualization steps with your peers.

If you want to run Jupyter notebook in your local machine do refer the link below. With a few easy steps, you can have Jupyter notebook up and running in your machine.

http://jupyter.readthedocs.io/en/latest/install.html

One of the easiest ways to use Jupyter is running the notebook on Azure. No need to have python or the dependencies of it installed on your local machine. You can create, edit and share the Jupyter notes using Azure Machine Learning Studio. All the execution happens on the cloud.

Let’s get started!

1Access your notebook from “Notebooks” tab of AzureML Studio. When creating a new notebook, you can select which language and version you want to have in your notebook. Python 2, Python 3 and R are the supported languages right now.

Same as the Jupyter notebook running on the local machine, you get the same IPython interface on your browser.

2On the notebook menu bar, you can find out the ‘help’ menu which contains a brief user interface tour as well as a list of keyboard shortcuts that you can use to drive the notebook.

Here’s a little data mashup I’ve done using the famous ‘Iris dataset’ included in python sklearn. The .ipynb file is available on my github repo. Feel free to download and play with. A static html page created with the notebook output also included in the repo.

Azure is coming up with Azure Notebook preview feature. Here’s Iris visualization hosted on Azure Notebook

https://notebooks.azure.com/library/Python%20Visualizations/html/Iris+Data+Visualization.ipynb

No Machine learning algorithms or complex code snippets here. Just a data visualization & data transformation. 🙂