Natural Language Processing with Python + Visual Studio

cap_4Human Language is one of the most complicated phenomena to interpret for machines. Comparing to artificial languages like programming languages and mathematical notations, natural languages are hard to notate with explicit rules. Natural Language Processing, AKA Computational Linguistics enable computers to derive meaning from human or natural language input.

When it comes to natural language processing, text analysis plays a major role. One of the major problems we have to face when processing natural language is the computation power. Working with big corpus and chunking the textual data into n-grams need a big processing power. All mighty cloud; the ultimate savior comes handy in this application too.

Let’s peep into some of the cool tools you can use in your developments. In most of the cases, you don’t want to get the hassle of developing from the scratch. There are plenty of APIs and libraries that you can directly integrate with your system.

If you think, you wanna go from scratch and do some enhancements, there’s the space for you too. 😊

Text Analytics APIs

Microsoft text analytics APIs are set of web services built with Azure Machine Learning. Many major tasks found in natural language processing are exposed as web services through this. The API can be used to analyze unstructured text for tasks such as sentiment analysis, key phrase extraction, language detection and topic detection. No hard rules are training loads. Just call the API from your C# or python code. Refer the link below for more info.

https://docs.microsoft.com/en-us/azure/machine-learning/machine-learning-apps-text-analytics

Process natural language from the scratch!

Python! Yeah. that’s it! Among many languages used for programming, python comes handy with many pre-built packages specifically built for natural language processing.

Obviously, python works well with unix systems. But now the best IDE in town; Visual Studio comes with a toolset for python which enable you to edit, debug and compile python scripts using your existing IDE.  You should have Visual Studio 2015 (Community edition, professional or enterprise) for installing the python tools. (https://www.visualstudio.com/vs/python/)

Here I’ve used NLTK (Natural Language Tool Kit) for the task. One of the main advantage with NLTK is, it comes with dozens of built in corpora and trained models.

These are the Language processing tasks and corresponding NLTK modules with examples of functionality comes with that.

cap_1

Source – http://www.nltk.org/book/ch00.html

For running python NLTK for the first time you may need to download the nltk_data. Go for the python interactive console and install the required data from the popping up NLTK downloader. (Use nltk.download()  for this task)

cap_2

Here’s a little simulation of natural language processing tasks done using NLTK. Code snippets are commented for easy reading. 😊

import nltk
from nltk.corpus import treebank
from nltk.corpus import stopwords

#Sample sentence used for processing
sentence = """John came to office at eight o'clock on Monday morning & left the office with Arthur bit early."""

#Tokenizing the sentence into words 
word_tokens = nltk.word_tokenize(sentence)
#Tagging words
tagged_words = nltk.pos_tag(word_tokens)
#Identify named entities
named_entities = nltk.chunk.ne_chunk(tagged_words)

#Removing the stopwords from the text - Predefined stopwords in English have been used.
stop_words = set(stopwords.words('english'))
filtered_sentence = [w for w in word_tokens if not w in stop_words]

filtered_sentence = []

for w in word_tokens:
    if w not in stop_words:
        filtered_sentence.append(w)

print('Sentence - ' + sentence)
print('Word tokens - ')
print(word_tokens)
print('Tagged words - ')
print(tagged_words)
print('Named entities - ')
print(named_entities)
print('Word tokens - ')
print(word_tokens)
print('Filtered sentence - ')
print(filtered_sentence)

 

The output after executing the script should be like this.

cap_3You can Improve these basics to build Named Entity Recognizer and many more…

Try processing the language you read and speak… 😉

SQL support in R tools for Visual Studio

If you have any kind of interest in data science or machine learning, you’ll probably found out that R language is the ultimate survivor. If you are a developer familiar with Visual Studio, you don’t have to adopt for RStudio again. You can code R inside VS!

R Tools for Visual Studio (RTVS) recently released the 0.5 version. One useful feature comes with the new version is SQL integration. With that you can directly import the data loads in your SQL database to a R environment. SQL queries can help you to fetch the data that you want. You can easily play with the data using R then.

First, you have to have Visual Studio 2015 with update 3. (Visual Studio 2015 Comunity edition is freely available to download) Update your VS if you haven’t done it yet and download RTVS 0.5 from here & install it.
https://aka.ms/rtvs-current

1
In your R project you can add SQL Query item (Right click on solution explorer and “Add new item”) which is created as a *.sql file.

2
On the top of the panel you can connect the database using “connect” icon. There you should configure the server name, server authentication and the database details.

3

Inside the .sql file you can execute the typical SQL queries to fetch data from the SQL database. One main advantage of this is, by enabling the execution plan you can analyze and optimize the SQL query you written.

4

Adding a database connection for the R project –

Go to R tools -> Data -> Add Database Connectionconnection-prop
Provide the authentication details of the database that you want to access. Then test the connection using “Test Connection” button. After clicking ‘ok’, you can see the database connection string is automatically generated inside settings.R file. Within the R code you can access for data inside the particular database as shown in the following example code.

final-screen

The str() output is shown in the R console

The example shows the code used for accessing the data in ‘Iris Data’ table inside ‘DMDatasets’ database placed in the local SQL server. Make sure to install “RODBC” R package to use the database related functions inside R.

#Need RODBC package to extablish the ODBC database iterface
install.packages("RODBC")
require("RODBC")

#Auto-generated Settings.R file should be added as a source
#The connection string contains in this file  
source("Settings.R")
conn <- odbcDriverConnect(connection = dbConnection)

#To get the tables of particualr database
tbls <- sqlTables(conn, tableType = "TABLE")
print(tbls)

#The SQL query is used to fetch data from the table 
sql <- "SELECT * FROM [dbo].[Iris Data]"
df <- sqlQuery(conn, sql)
str(df)
#plotting the dataset
plot(df) 

No need of switching developer environments to handle your coding as well as data analytics tasks. Just keep Visual Studio as your default IDE! 🙂