The global pandemic situation has changed the world’s view on technological interventions and innovations. Carrying their corporate mission of “Empowering every person and every organization on the planet to achieve more”, the big tech giant Microsoft hold their annual developer conference “Build 2020” as a 48-hour virtual event.
Being virtual didn’t get down the excitement the conference creates among the developer community as well as within the enterprises. I blogged about few exciting announcements Microsoft did on Build 2020 related to AI on Kodez blog. Let’s discuss some on the interesting use cases of these announcements in future posts.
One of the most time-consuming tasks in the machine learning model development pipeline is data labeling. When it comes to a computer vision related task which may use deep learning methodologies, you may need thousands of labelled data to train your models.
Azure Machine Learning offers a new feature for data labeling tasks specifically designed for computer vision related applications. Right now, AzureML supports 3 types of data labeling tasks.
Image Classification multi-label – If the images in the image set is having more than one label for the image, this is the task type to go with.
Image Classification multi-class – This is the simple image classification type tasks, where each image is having a single label and the dataset is having multiple labels
Object Identification (Bounding Box) – If you need to have annotations for set of images to train a model for object detection, you may have to have bounding box annotations. This is the task type you should choose for such tasks.
Data labeling feature is available in both basic and enterprise versions of AzureML. Despite Basic version is not having the capability of ML assisted data labeling where a ML model is automatically built to assist the labeling process. ML assisted data labeling need a GPU based compute resource for performing model training and inferencing. (Obviously this comes with a cost then 😉 )
There are two ways that you can add the images to build the dataset.
Upload the image files into an Azure blob storage and register it as a DataStore in Azure ML (I’d recommend this since it may bypass the storage restrictions of the default storage)
Upload directly for the default storage
One of the attractive features I’ve seen in the data labeling process is the ability to use keyboard shortcuts which makes the process much more user friendly.
The annotation files can be exported as a JSON which follows the COCO dataset format (This file is saved in the default blob storage of the experiment) or else can be registered as an Azure ML dataset. The progress of the data labeling project can be monitored through the dashboard on AzureML Studio.
Seems like Microsoft is having more plans on developing this product further and hope there would be interesting additions in the coming future.
Being one of the major public cloud providers, Microsoft Azure provides numarous products, services and tools for intelligent application development. This is a high level guideline to select the appropriate product for your application development.
I have just pinpointed the most used tools and services here. The services can be interconnected with each other in order to develop applications for more complex use cases.
Early concepts of machine learning came out back in 1950s but people were not able to explore the full power since most of the machine learning algorithms were largely computationally expensive. The computing power of the early systems were not enough to process large amounts of data using complex algorithms. Since cloud computing and GPUs opened the arena to perform complex computations, machine learning and deep learning got a boost and now used widely in many real-world applications.
As we discussed in the previous posts, being a leading cloud-based machine learning platform, Azure Machine Learning solves three main burdens in machine learning model development and deployment process.
Setting up development environment with the burden of solving platform and software library dependencies.
Setting up the computing environments (parallel processing libraries (CUDA) etc.)
Setting and managing deployments
All of these 3 key areas in machine learning model development require some sort of a computing resource to create and manage. Azure machine learning service has centralized all the resources to a easy accessible portal allowing the developer to select the most suitable resource for their need.
Compute tab of the AML studio contains (as of the date am writing this) 4 main compute categories for specific purposes. Will go through each of those and see in where we can occupy them in our machine learning experiments.
Compute Instances –
No need of messing around with configuring CUDA and all python packages to set up the laptop for performing machine learning experiments or the data visualization. You can just go through few steps in a wizard to create a preconfigured computing instance on Azure. This is more similar for creating a virtual machine on Azure. If you need to use GPU based computing, you may have to select a N-series VM on Azure. (Make sure the region you selected is having the required VM families)
In order to do the experiments on the compute instance you can either use JupyterLab, Jupyter notebook, RStudio or through an SSH connection. You have to create a SSH public key on Azure to access the compute instance through SSH.
GPU based compute instances are costly. Create such instances if you really need to do deep learning based experiments.
Think of a complex deep learning scenario where data preprocessing need large amount of CPU processing time while model training should be done using GPUs… You can use two compute instances where preprocessing happens in a CPU based instance while the GPU based expensive compute instance is used for model training. (Connecting these two processes can be done using Azure machine learning pipelines)
Make sure to deallocate the resources when you are not using it. (Else you should have a fat wallet in your pocket)
Training clusters –
Training clusters in Azure ML is the survivor when we are having complex computations to perform. You can perform tasks as Automated ML and hyperparameter tuning on these preconfigured clusters. The maximum number of nodes can be configured according to your need. Underlying technology behind the training clusters is docker containers. Simply you containerize your experiment and push into the cluster for computations/training.
Inference clusters –
The end result of the experiments you perform sits on inference clusters. The web service endpoints you create can be deployed on this AKS based inference clusters. You can go for a low-cost inference cluster with few nodes for dev-test and a high performing cluster with many nodes according to the requirements in the production environment. (normally we use ACI for dev-test and AKS for production web service end points)
Attached compute –
This is an interesting feature in Azure machine learning where you can push your machine learning workloads into external computing environments. Right now, AML is supporting
Data Lake Analytics
Your VM should be running Ubuntu in order to connect as an attached compute.
Will discuss how to use these computing resources in your machine learning experiments in the future posts.
Being the fundamental and the most vital factor in any machine learning experiment, the way of handling data in your experiments is crucial. Here we going to discuss different ways of managing your data sources inside Azure Machine Learning (AML).
Since the new Azure Machine Learning Service is becoming the one-stop place for managing all ML related workloads in Azure, the functions and methods can be created/managed using the web portal or using AzureML python SDK (You may use AzureML R SDK or the Azure CLI too)
Data comes in all shapes and sizes. In order to tackle these different data scenarios AML offers different options to manage the data. Let’s discuss these options one by one with their usages, pros and cons.
Datastore is the place where the data sits in an AML experiment. Your AML workspace can have one or more Datastores connected according to your need.
AML is all about cloud-based machine learning. So that, I would recommend using some sort of Azure based storage to store your data in the first hand. Blob storage, File Share, Data Lake Storage, Azure SQL database, Azure PostgreSQL, Azure DB for MySQL and Databricks file system are currently supported data storage types to create Datastores. (Will say your data is at on-prem SQL database. You can use Azure Data Factory to migrate your data load onto Azure.)
You can see the Datastores registered to your workspace either through AML Studio (ml.azure.com) or through the python SDK. When you create a workspace, by default two Datastores are created : workspacefilestore and workspaceblobstore.
Workspaceblobstore acts as the default Datastore of experiments. You can change it at any time through the SDK. This is the place where all your code and other files you put in the experiment sits.
Is it a good idea to keep your data on the workspaceblobstore?
Scenario #1 : You are doing a toy experiment with a small dataset (Eg. a 2 MB CSV file). Dataset is static and no plan on updating that during the experiment. Yes! That’s completely ok to keep the dataset inside workspaceblobstore.
Scenario #2 : You are doing a deep learning experiment with a 100,000 images. No! Never use workspaceblobstore to keep your data.
Why not the workspaceblobstore always?
Workspaceblobstore is having a file and storage limitation (300MB and/or 2000 files). So that it’s impossible to use it when we have a large dataset. On the other hand, this directly affects for the docker image size (or the snapshot size) you may create for the experiment. Bulky snapshots or the docker images are not a good thing. Always keep it simple and modularized. So always the workspaceblobstore is a no go!
AML datasets is the high-level abstraction of the data you use in experiments. You may create an AML dataset from,
A local file / local files
Registered datastore (from file(s) sit on a datastore)
Azure Open Datasets
The AML datasets we create may belong to two main types:
Tabular datasets – If you have a file/ files that contains data in a tabular format (CSV, JSON line files, Parquet files, Tabular data in SQL databases etc.) creating a tabular dataset would be beneficial as it allows to transform data into a Pandas or Spark DataFrame.
File datasets – Refer to a single or multiple file in your Datastore or on a public URL. File datasets comes handy when you have a scenario like a dataset with 1K images.
AML datasets comes with the advantage of versioning and tracking as well as monitoring. It’s not a hard thing to create perform data drift detection or a simple statistical analysis on data fields of the dataset with a few clicks.
Microsoft recommends to use AML datasets always in experiments rather than pointing for the datastore directly (which is totally possible). I’ve found out pros and cons in both approaches.
AML datasets are easy to version and manage compared to datastore.
If you have a tabular dataset, I would always recommend to go for AML tabular datasets.
It becomes tricky when you have files. If you use File datasets, you have to use to_path() method to get the list of file paths defined by the dataset. This comes as a flat list! If you are not concerned about the directory structure of the data this is totally fine. But if you wish to create custom dataloaders (Eg. PyTorch custom dataloaders which allows to differentiate classes according to directories) this may not come handy. (You can do a workaround by processing the file path to determine the directory structure though 😀 )
Keep in mind that AML dataset mount() only works for unix-like OS. If you wish to run your experiment on a windows running workstation you may have to download() the dataset.
Will discuss on using these datasets in different model training scenarios in future posts.
There are just some of the experiences I had when playing around with new Azure Machine Learning. The Microsoft Learning GitHub repo (https://github.com/MicrosoftLearning/DP100) for DP100 exam is a really nice place to find some example code on using these functionalities. Let me know your find outs and experiences with AML too 😊
Out of all the public cloud platforms, Microsoft Azure has adopted all most all the steps in machine learning life cycle into cloud. Though the resources and the abilities are there, sometimes finding the correct cloud-based product or service to adopt for your solution might be a problem.
Providing a perfect answer for that issue, Azure has come up with the whole new Azure Machine Learning Studio which is in the preview by the time am blogging this! (Don’t get confused with the AzureML Studio, the drag and drop interface we had before. This is a new thing – ml.azure.com ) There’s no framework dependency or restrictions for using these services. You can easily adapt your open source machine learning code base (may be written with Python, sci-kit learn, TensorFlow, PyTorch, Keras… anything)
In order to use this one-stop solution in Azure you may have to create an Azure Machine Learning Service from Azure portal. Then it’ll direct you for the new interface. You can either go for the Enterprise pricing tier or for the Basic tier. In basic tier you won’t get the visual designer and Automated ML features.
Let’s go through each and every tab we got in the side pane in the latest release and see what can we do with them.
These are fully managed Jupyter notebook instances on cloud. These notebook servers are running on top of a new VM instance type called “notebookVM”. There notebookVMs are fully configured work environments to do your machine learning and data science tasks. No need to worry about installing all the python packages and its dependencies. All are already there! You have the privilege to change the notebook sizes (Yes GPU enabled VMs are also there) or install new packages through a python package manager too.
Automated ML –
Not available in the free tier though. A process of selecting the best suited algorithm for the dataset you are having. Right now, this is supporting classification, regression and time series forecasting for tabular data formats. Not supported for deep learning based computer vision applications. Deep learning based text analysis is also in preview. The Automated ML process runs a set of machine learning algorithms on top of your provided data and see which one gives the best accuracy metric. Good for building prototypes and even in some cases might be in production.
This is the evolution of Azure ML Studio (Old drag and drop thing). Seems like Microsoft is going to end its lifecycle and give the new Designer be its replacement. Here you can build the complete ML workflow by dragging and dropping modules. If you want can integrate R or python scripts in the experiment. The machine learning service endpoint can be exposed through a Azure Kubernetes Service (AKS) deployment.
The place to manage and version your datasets. Datasets can be either tabular or file based. Here you can profile your dataset by performing a basic statistical analysis on your data. If your dataset is sitting on a datastore (which we going to discuss later) this acts as a high-level encapsulation of that data.
You may execute several runs on the same experiment with different configurations. This is the place where you can see all the log files of them and compare the runs with each other.
Don’t get confused Azure machine learning pipelines with the Azure pipelines. Azure ML pipelines are specifically designed for MLOps tasks. You can manage the whole experiment process till production using ML pipelines. These pipelines are reusable and help collaborative development of the solution.
You can register the trained ML models here. Versioning the models, managing which model to go for production are some use cases of this model registry. You can register models that has been trained outside the particular Azure ML workspace too here.
Endpoint of an azure machine learning experiment can be a web service or an IoT module endpoint. Managing the endpoint keys etc. are performed in this section.
In most of the cases, you may use Azure for computations. Here in the Compute section you can create and manage the following compute resource types.
Notebook VMs – as we discussed previously in the Notebooks section this is a fully managed ML development environment suits for development and prototyping purposes.
Training clusters – You can make a either a CPU based or a GPU based cluster for running your experiments. Note that this would be charged according to the computation hours as well as for the number of nodes you are using. Good thing is there’s no charge when you are not using the cluster for computation.
Inference clusters – This talk about AKS clusters where you can deploy the endpoints. Even you can register a prevailing AKS cluster as an inference cluster.
Attached compute – If you working with Azure Databricks, Data Lake Analytics or HDInsight you can configure the computation here. In an interesting use case if you want to attach your physical computer (which should be a workstation running Ubuntu) as a compute target it’s also possible through the AML service.
When it comes to machine learning experiments its normal to have large amount of data. These data may sit in your Azure storage. Datastore is the storage abstraction over an Azure storage account which then you can use inside your machine learning experiments.
Data labeling –
A cool new feature for data annotators. Right now, this supports Image classification in multi-label / multi-class and object identification (bounding box) annotations. The annotator should not have to have an Azure subscription. You can easily outsource your tedious annotation workload through this feature.
This is just an overview of the options we are having with new Azure Machine Learning Studio. It’s pretty clear that Azure team is going to get all the ML related services under one umbrella. Let’s discuss some cool use cases and tips on using these services in next blog posts.
Loading massive and complex datasets for training deep
learning models has become a normal practice in most of the deep learning
experiments. Handling large datasets which contain multimedia such as images,
video frames and sound clips etc. can’t be perform just with simple file open
commands which drastically reduce the model training efficiency.
Featuring a more pythonic API, PyTorch deep learning
framework offers a GPU friendly efficient data generation scheme to load any
data type to train deep learning models in a more optimal manner.
Based on the Dataset class (torch.utils.data.Dataset) on PyTorch you can load pretty much every data format in all shapes and sizes by overriding two subclass functions.
__len__ – returns the size of the dataset
__getitem__ – returns a sample from the dataset given an index.
Here’s a rough skeleton of the Dataset class which you can
modify for your need.
from torch.utils.data.dataset import Dataset
#If available use GPU memory to load data
use_cuda = torch.cuda.is_available()
device = torch.device("cuda:0" if use_cuda else "cpu")
def __init__(self, ...):
# # All the data preperation tasks can be defined here
# - Deciding the dataset split (train/test/ validate)
# - Data Transformation methods
# - Reading annotation files (CSV/XML etc.)
# - Prepare the data to read by an index
def __getitem__(self, index):
# # Returns data and labels
# - Apply initiated transformations for data
# - Push data for GPU memory
# - better to return the data points as dictionary/ tensor
return (img, label)
return count # of how many examples(images?) you have
These are some tips and tricks I follow when writing custom
dataloaders for PyTorch.
Datasets will expand with more and more samples and, therefore, we do not want to store too many tensors in memory at runtime in the Dataset object. Instead, we will form the tensors as we iterate through the samples list. This approach may be bit slow in processing but save us from going out of memory.
__init__ function should be the place where all the initial data preparations and logics happens. Do the operations where you may need to read data annotation files (CSV/XML etc.) here.
If you have separate portions of the dataset for train/test and validate, make sure you define that logic inside __init__ function. You can pass the desired data split as an argument for the function.
__init__ function is the place where you can define the data transformations. For an example, if you have image data to load and need to do resize and normalize images you can use torchvision transforms here.
#Example transform for image data
self.transform = transforms.Compose([transforms.Resize((224,224)),
transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])])
Make sure you index your custom dataset in a relational structure when initiating. Generating an array or a list of the datapoints is a better way to do it.
__len__ function comes handy to see how many data points has been loaded through init. The data length is normally the number of records loaded into the final list or array you created inside __init__ .
__getitem__ function should be light weight. Avoid using too complex computations inside __getitem__ function.
PyTorch DataLoaders just call __getitem__() and wrap them up a batch when performing training or inferencing. So, this function is iterative. Make sure you return one datapoint at a time.
Always try to return the values from __getitem__ as tensors.
If you have multiple components to return from the DataLoader, using a Python dictionary is a handy option. You can structure it as key value pairs in the dictionary. Here’s an example dictionary item which contains four values in it.
You should create a CustomDataset object when you need to consume
the data. This is a sample code snippet that demonstrate how to access the data
points through the custom dataloader you created.
#Consuming the dataset
#creating the dataset object
dataset = MyCustomDataset(...)
#Randomly split dataset into trainset and the validation set
train_data, val_data = random_split(dataset, [50000, 10000])
#Create DataLoader iterators
train_loader = DataLoader(train_data, batch_size=64, shuffle=True, num_workers=2)
val_loader = DataLoader(val_data, batch_size=64, shuffle=True, num_workers=2)
#Iterating through the data loader object
for i, batch in enumerate(train_loader):
You may notice, the dataLoader iterator can be batched,
shuffled and load the data using multiprocessing just by changing the
parameters in the function. Make sure you choose a batch size which fits with
your memory capacity. If you loading the data to the GPU, it’s the GPU memory you
should consider on.
If you using a multi-GPU setup with PyTorch dataloaders, it
tries to divide the data batches evenly among the GPUs. If the batch size is
less than the number of GPUs you have, it won’t utilize all GPUs.
I would say CustomDataset and DataLoader combo in PyTorch has
become a life saver in most of complex data loading scenarios for me. Would
love to hear from you on the experiences you have with writing Custom DataLoaders
Training and evaluating deep learning models may take a lot of time. Sometimes it’s worth to monitor how good or bad the model is training in real-time. It’ll help to understand, debug and optimize your models without waiting till the model get trained to monitor the performance.The good old method of printing out training losses / accuracy for each epoch is a good idea, but it’s bit hard to evaluate the metrics comparatively with that.
A real-time graphical interface that can use to plot/ visualize metrics while a model is training through epochs or iterations would be the best option. Tensorboard is visualization tool came out with TensorFlow and I’m pretty sure almost all TF guys are using and getting the advantage from that cool tool.
So what about PyTorchians?? Don’t panic. Official PyTorch repository recently came up with Tensorboard utility on PyTorch 1.1.0 . Still the code is experimental and for me it was not working well for me.
Then, I found this awesome opensource project, tensorboardX. Pretty similar to what PyTorch official repo is having and easy to work with. TensorboardX supports scalar, image, figure, histogram, audio, text, graph, onnx_graph, embedding, pr_curve and video summaries.
5 simple steps…
Import tensorboardX for your PyTorch code
Create a SummaryWriter object
I just did a simple demo on this by adding Tensorboard logs for the famous PyTorch transfer learning tutorial. Here’s the GiHub repo. Just clone and play around it.
Note that in the experiment I’ve used two SummaryWriter objects two create two scalar graphs for training phase and the other one for validation phase.
The log files will be created in the directory you specified when creating SummaryWriter object. (You can change this directory to wherever you want)
To view the tensorboard, open a terminal inside the experiment folder. Assume that your log files are inside ‘./logs/’ . Use the following command to spin up the tensorboard server on your local machine.
$ tensorboard –logdir ./logs/
Sometimes you may use a remote server or a VM (might be a Azure DLVM) for training your deep learning models. Then how to get this tensorboard out from there??
SSH Tunneling with post forwarding is a good option you can use for this. You just have to spin up the tensorboard service on your remote machine. Then tunnel the server back to your workstation with the ssh command stated below.
When it comes to deep learning model development and training, personally for me, the majority of the time is spent on data pre-processing, then for setting up the development environment. Cloud based development environments such as Azure DLVM, Google CoLab etc. are very good options to go with when you don’t have much time to spend on installing all the required packages for your workstation. But, there are times that we want to do the development on our machines and train/deploy in another place (may be on the client’s environment, for a machine with a better GPU for faster training or to train on a Kubernetes cluster). Docker comes handy in these scenarios.
Docker provides both hardware and software encapsulation by allowing portable deployment. If you are a data scientist/ machine learning guy or a deep learning developer, I strongly recommend you to give it a try with docker and I’m pretty sure that’ll make your life so easy.
Alright! That’s about docker! Let’s assume now you are using docker for deploying your deep learning applications and you want to use docker to ship your deep learning model to a remote computer that is having a powerful GPU, which allows you to use large mini-batch sizes and speedup your training process. Though docker containers solve the problem of framework dependencies and platform dependencies it is also hardware-agnostic. This creates a problem!
Have you ever tried to access the GPU resource on the host computer from a program running inside a docker container? Sadly, Docker does not natively support NVIDIA GPUs within containers.
The early work around was installing the Nvidia drivers inside the docker container. It’s bit of a hassle as the driver version installed in the container should match the driver on the host.
For making docker images that uses GPU resources more portable, Nvidia has introduced nvidia-docker!
Nvidia-docker is a wrapper around the docker command that mounts the GPU on the host machine with the docker container. The only thing you should pay your attention is the CUDA version you want to use.
So, in which scenarios you can use this? In my case, nvidia-docker comes handy for me when I’m running my experiment on a cluster which is having a higher GPU power. What I do is just containerize all my code into a docker and run on the remote with nvidia-docker. (Windows guys… nvidia-docker is not still available for windows hosts. Not sure if that is in the development timeline or not 😀 )
Here’s the official GitHub on nvidia-docker. Just install it at make sure to restart your docker engine and make sure nvidia-docker the default docker run-time. Then rest is the same as building and running a typical docker.
Here’s a simple docker file I wrote for containerizing my PyTorch code. I’ve used CUDA 9.1. You can modify this for your need.
# Install some basic utilities
RUN apt-get update && apt-get install -y \
libx11-6 \&& rm -rf /var/lib/apt/lists/*
# Create a working directory
RUN mkdir /app
# Create a non-root user and switch to it
RUN adduser --disabled-password --gecos '' --shell /bin/bash user \&& chown -R user:user /app
RUN echo"user ALL=(ALL) NOPASSWD:ALL" > /etc/sudoers.d/90-user
# All users can use /home/user as their home directory
RUN chmod 777 /home/user
# Install Miniconda
RUN curl -so ~/miniconda.sh https://repo.continuum.io/miniconda/Miniconda3-4.5.1-Linux-x86_64.sh \&& chmod +x ~/miniconda.sh \&& ~/miniconda.sh -b -p ~/miniconda \&& rm ~/miniconda.sh
ENV CONDA_AUTO_UPDATE_CONDA=false# Create a Python 3.6 environment
RUN /home/user/miniconda/bin/conda install conda-build \&& /home/user/miniconda/bin/conda create -y --name py36 python=3.6.5 \&& /home/user/miniconda/bin/conda clean -ya
ENV PATH=$CONDA_PREFIX/bin:$PATH# Install PyTorch with Cuda 9.1 support
RUN conda install -y -c pytorch \cuda91=1.0 \
magma-cuda91=2.3.0 \pytorch=0.4.0 \torchvision=0.2.1 \&& conda clean -ya
RUN conda install opencv
# Install other dependencies from pip #My requirments.txt file jsut contains the following packages I used for the code. Change this for your need.#numpy==1.14.3#torch==0.4.0#torchvision==0.2.1#matplotlib==2.2.2#tqdm==4.28.1
COPY requirements.txt .
RUN pip install -r requirements.txt
# Create /data directory so that a container can be run without volumes mounted
RUN sudo mkdir /data && sudo chown user:user /data
# Copy source code into the image
COPY --chown=user:user . /app
# Set the default command to python3
Here’s the bash command used for running the docker using the Nvidia run-time.
# 1. Build image
docker build .
# 2. Run the docker image
docker run \
--runtime=nvidia -it -d \
--rm <dockerImage> python3 <yourCode.py>
Just try it and see how your deep learning life becomes easy! Happy coding! 🙂
I would say, training a deep neural network model to achieve a good accuracy is an art. The training process enable the model to learn the model parameters such as the weights and the biases with the training data. In the process of training, model hyper-parameters govern the process. They control the behavior of model training and does a significant impact on model accuracy and convergence.
Learning rate, number of epochs, hidden layers, hidden units, activation functions, momentum are the hyperparameters that we can adjust to make the neural network models perform well.
Adjusting the learning rate is a vital factor for convergence because a small learning rate makes the training very slow and can occur overfitting, while if the learning rate is too large, the training will diverge. The typical way of finding the optimum learning rate is performing a grid search or a random search which can be computationally expensive and take a lot of time. Isn’t there a smart way to find out the optimal learning rate?
Here I’m going to connect some dots together on a process I followed to choose a good learning rate for my model and a way of training a DNN with different learning rate policy.
I give it a try using a simple transfer learning experiment. The dataset and the experiment I used here is from the PyTorch documentation which you can find here. These are the steps I followed during the experiment.
Run the LR range finder to find the maximum learning rate value to use on 1cycle learning.
Output from the LR finder
According to the graph it is clear that 5*1e-3 can be the maximum learning rate value that can be used for training. So, I chose 7*1e-3; which is bit before the minimum as my maximum learning rate for training.
Run the training using a defined learning rate (Note that a learning rate decay has used during training)
Run the training according to the 1cycle policy. (A cyclical momentum and cyclical learning date have been used. Note that the learning rate and the momentum is changing in each mini-batch: not epoch-wise.)
Compare the validation accuracy and validation loss of each method.
Can you notice that the green line, which represents the experiment trained using 1cycle policy gives a better validation accuracy and a better validation loss when converging.
These are the best validation accuracy of the two experiments.
Fixed LR : 0.9411
1-cycle : 0.9607
Tip : Use the batch size according to the computational capacity you are having. The number of iterations in 1cycle policy depends on the batch size, number of epochs and the dataset size you are using for training.
Though this experiment is a simple one, it is proven that 1cycle policy does a job in increasing the accuracy of neural network models and helps for super convergence. Give it a try and don’t forget to share your experiences here. 😊