Data Labeling in Azure ML Studio

One of the most time-consuming tasks in the machine learning model development pipeline is data labeling. When it comes to a computer vision related task which may use deep learning methodologies, you may need thousands of labelled data to train your models.

Create a Data Labeling project on Azure ML Studio

Azure Machine Learning offers a new feature for data labeling tasks specifically designed for computer vision related applications. Right now, AzureML supports 3 types of data labeling tasks.

  1. Image Classification multi-label – If the images in the image set is having more than one label for the image, this is the task type to go with.
  2. Image Classification multi-class – This is the simple image classification type tasks, where each image is having a single label and the dataset is having multiple labels
  3. Object Identification (Bounding Box) – If you need to have annotations for set of images to train a model for object detection, you may have to have bounding box annotations. This is the task type you should choose for such tasks.
Selecting the data labeling task type

Data labeling feature is available in both basic and enterprise versions of AzureML. Despite Basic version is not having the capability of ML assisted data labeling where a ML model is automatically built to assist the labeling process. ML assisted data labeling need a GPU based compute resource for performing model training and inferencing. (Obviously this comes with a cost then 😉 )   

There are two ways that you can add the images to build the dataset.

  1. Upload the image files into an Azure blob storage and register it as a DataStore in Azure ML (I’d recommend this since it may bypass the storage restrictions of the default storage)
  2. Upload directly for the default storage    

One of the attractive features I’ve seen in the data labeling process is the ability to use keyboard shortcuts which makes the process much more user friendly.

Data Labeling interface of a bounding box annotation task

The annotation files can be exported as a JSON which follows the COCO dataset format (This file is saved in the default blob storage of the experiment) or else can be registered as an Azure ML dataset. The progress of the data labeling project can be monitored through the dashboard on AzureML Studio.

Seems like Microsoft is having more plans on developing this product further and hope there would be interesting additions in the coming future.

Official documentation : https://docs.microsoft.com/en-us/azure/machine-learning/how-to-create-labeling-projects