Connecting Azure SQL server with Azure Machine Learning

Accessing data in different data sources is one of the main tasks in machine learning model development life cycle. Let’s discuss one of the most common data accessing scenarios.

Scenario :

We have to set of relational data points stored in a Azure SQL server to develop a machine learning model using Azure Machine Learning. Let’s see how to leverage data stored in an Azure SQL database in an Azure Machine Learning experiment.

The process contains three main steps.

  1. Set access permissions of Azure SQL database
  2. Connect Azure SQL database to an Azure ML datastore
  3. Register the data in datastore as an Azure ML dataset.

1. Set access permissions of Azure SQL database

Allow Azure services and resources to access this server

By default Azure SQL databases are protected with a firewall which limits outside access for data. Since we going to provide access for the traffic from Ips belongs to Azure resources and services, make sure you allow Azure services to access your SQL server.

2. Connect Azure SQL database to an Azure ML datastore

Azure ML datastores can be defined as the abstraction of data sources for the ML workspace or as the interconnection between the data resource and AzureML workspace.

Go to your Azure Machine Learning Studio (ml.azure.com) and click ‘New datastore’. Provide a datastore name and select ‘Azure SQL database’ as the datastore type. Make sure to authenticate the access with Azure SQL server’s user ID and the password.

Register a new datastore

3. Register the data in datastore as an Azure ML dataset.

AzureML supports two types of datasets (Take a look here to get an overview on the difference between those). Since we are dealing with a set of relational data, Tabular dataset is the option we have to use for creating the dataset.

Create dataset from datastore

Select ‘Create dataset’ from ‘Datasets’ tab on AML Studio and prmopt to ‘From datastore’ option.

Select the datastore we created in the previous step which establish the connection between AML workspace and the data source.

Provide the required SQL query to select the required data from SQL server. Make sure to validate the data before configuring the schema.

Preview dataset

All done! Now you have the access to the data in your Azure SQL database from AzureML workspace. You can easily refer this in your experiments.

Validate dataset

In the cases where your database is getting updated time to time, what you have to do is refreshing the dataset to fetch the newest data points specified by the SQL query.

SQL support in R tools for Visual Studio

If you have any kind of interest in data science or machine learning, you’ll probably found out that R language is the ultimate survivor. If you are a developer familiar with Visual Studio, you don’t have to adopt for RStudio again. You can code R inside VS!

R Tools for Visual Studio (RTVS) recently released the 0.5 version. One useful feature comes with the new version is SQL integration. With that you can directly import the data loads in your SQL database to a R environment. SQL queries can help you to fetch the data that you want. You can easily play with the data using R then.

First, you have to have Visual Studio 2015 with update 3. (Visual Studio 2015 Comunity edition is freely available to download) Update your VS if you haven’t done it yet and download RTVS 0.5 from here & install it.
https://aka.ms/rtvs-current

1
In your R project you can add SQL Query item (Right click on solution explorer and “Add new item”) which is created as a *.sql file.

2
On the top of the panel you can connect the database using “connect” icon. There you should configure the server name, server authentication and the database details.

3

Inside the .sql file you can execute the typical SQL queries to fetch data from the SQL database. One main advantage of this is, by enabling the execution plan you can analyze and optimize the SQL query you written.

4

Adding a database connection for the R project –

Go to R tools -> Data -> Add Database Connectionconnection-prop
Provide the authentication details of the database that you want to access. Then test the connection using “Test Connection” button. After clicking ‘ok’, you can see the database connection string is automatically generated inside settings.R file. Within the R code you can access for data inside the particular database as shown in the following example code.

final-screen

The str() output is shown in the R console

The example shows the code used for accessing the data in ‘Iris Data’ table inside ‘DMDatasets’ database placed in the local SQL server. Make sure to install “RODBC” R package to use the database related functions inside R.

#Need RODBC package to extablish the ODBC database iterface
install.packages("RODBC")
require("RODBC")

#Auto-generated Settings.R file should be added as a source
#The connection string contains in this file  
source("Settings.R")
conn <- odbcDriverConnect(connection = dbConnection)

#To get the tables of particualr database
tbls <- sqlTables(conn, tableType = "TABLE")
print(tbls)

#The SQL query is used to fetch data from the table 
sql <- "SELECT * FROM [dbo].[Iris Data]"
df <- sqlQuery(conn, sql)
str(df)
#plotting the dataset
plot(df) 

No need of switching developer environments to handle your coding as well as data analytics tasks. Just keep Visual Studio as your default IDE! 🙂