Human Language is one of the most complicated phenomena to interpret for machines. Comparing to artificial languages like programming languages and mathematical notations, natural languages are hard to notate with explicit rules. Natural Language Processing, AKA Computational Linguistics enable computers to derive meaning from human or natural language input.
When it comes to natural language processing, text analysis plays a major role. One of the major problems we have to face when processing natural language is the computation power. Working with big corpus and chunking the textual data into n-grams need a big processing power. All mighty cloud; the ultimate savior comes handy in this application too.
Let’s peep into some of the cool tools you can use in your developments. In most of the cases, you don’t want to get the hassle of developing from the scratch. There are plenty of APIs and libraries that you can directly integrate with your system.
If you think, you wanna go from scratch and do some enhancements, there’s the space for you too. 😊
Text Analytics APIs
Microsoft text analytics APIs are set of web services built with Azure Machine Learning. Many major tasks found in natural language processing are exposed as web services through this. The API can be used to analyze unstructured text for tasks such as sentiment analysis, key phrase extraction, language detection and topic detection. No hard rules are training loads. Just call the API from your C# or python code. Refer the link below for more info.
Process natural language from the scratch!
Python! Yeah. that’s it! Among many languages used for programming, python comes handy with many pre-built packages specifically built for natural language processing.
Obviously, python works well with unix systems. But now the best IDE in town; Visual Studio comes with a toolset for python which enable you to edit, debug and compile python scripts using your existing IDE. You should have Visual Studio 2015 (Community edition, professional or enterprise) for installing the python tools. (https://www.visualstudio.com/vs/python/)
Here I’ve used NLTK (Natural Language Tool Kit) for the task. One of the main advantage with NLTK is, it comes with dozens of built in corpora and trained models.
These are the Language processing tasks and corresponding NLTK modules with examples of functionality comes with that.
Source – http://www.nltk.org/book/ch00.html
For running python NLTK for the first time you may need to download the nltk_data. Go for the python interactive console and install the required data from the popping up NLTK downloader. (Use nltk.download() for this task)
Here’s a little simulation of natural language processing tasks done using NLTK. Code snippets are commented for easy reading. 😊
The output after executing the script should be like this.
You can Improve these basics to build Named Entity Recognizer and many more…
Try processing the language you read and speak… 😉