Why we need GPUs for Deep Learning?

“If you wanna do deep learning experiments you should have a GPU”

This is one of the most common statements you should have heard over the years even from me. Do we really need a GPU for performing these experiments? If so why is that? So let’s dig in..

As we all know deep learning is all about deep neural networks. Training a deep neural network (DNN) is one of the most computationally expensive tasks in computer science. Since the DNN contains huge of numerical representations as the inputs as well as weights and biases inside the network, it can be mathematically seen as a set of multidimensional matrices.

A simplified illustration of an ANN

Now we have a set of huge multidimensional matrices.. next? In order to train a neural network to perform a particular task (will say classifying a bunch of images), we commonly use backpropagation algorithm which adjust the weights and biases of DNN according to the train set we given. In forward pass of training, input is passed through the neural network and after processing the input, an output is generated. Whereas in backward pass, we update the weights of neural network on the basis of error we get in forward pass. This operation includes huge set of matrix calculations (multiplications and summations). If you consider one single mathematical operation happens inside these passes, it’s pretty simple as multiplying two numbers. But when comes to a DNN such as VGG16 which contains 16 hidden layers (VGG16 is a convolutional neural network mostly specifically designed for computer vision tasks), it contains ~140 million parameters; aka weights and biases! So the number of calculations to perform for adjusting these weights and biases are tremendous!

Alright.. we now have millions and millions of calculations to be done.. So as we have very fast computers! What’s the issue? There’s no issue to be real. The problem is with the definition of speediness we have seen and using for our daily computations. Central Processing Unit (CPU) is the key component that contributes for computations in our computers. Typically a modern high end CPU is having 4-8 physical cores inside it. ( Intel Core i7 10th gen processor is having 8 physical cores) If you remember your early computer science lessons, a CPU can do only one task at a time. So to perform that millions and millions of calculations it’s gonna take ages! (Literally it may take years to train a complex DNN!) You may think of having a cluster with thousands of CPUs in parallel to do the task. Yes. It’s possible but it’s going to cost in millions plus going to consume a huge amount of power.

Since the single computational elements to be done is not complex in DNN training, Graphical Processing Units (GPUs) which are having hundreds/thousands of simple cores is a best match for this computational task. (typically a Nvidia940MX GPU sits on your laptop has 384 CUDA cores)

The table below will give your a better idea on GPU vs CPU. Just watch the video that demonstrate the power of thousands of small processors demonstrated by Mythbusters.

Source : https://www.slideshare.net/AlessioVillardita/ca-1st-presentation-final-published

Though we have a hype about GPUs today, usage of general purpose GPUs for scientific computing start in early 2000s. In 2006 Nvidia came out with CUDA language which allows to program GPUs using high level languages. Now all the deep learning frameworks have the ability to access the GPUs with just few lines of code. You can easily train a DNN model using the GPU you have on your laptop within minutes compared to training it on CPU which may take hours or days!

Alright.. alright…Now we need a GPU!

There are two main ways of leveraging the GPU power for your computations

1. Use a physical GPU fitted to your workstation/ laptop

This is quite straight forward. I would strongly recommend to use a Nvidia GPU since the cuDNN and CUDA packages are having great support plus good documentations for most of the deep learning libraries. AMD also have ROCm which is similar for CUDA based on openCL (https://rocmdocs.amd.com/en/latest/Current_Release_Notes/Current-Release-Notes.html) but I haven’t experienced it or used it. GPUs maybe bit expensive and plan ahead when you choosing a GPU for your workstation.

2. Using a GPU on cloud

This is a pretty cool. If you don’t have a GPU on your laptop but still you want to train the DNNs, you can easily spin up a GPU on a cloud and use it for your computations.

Almost all cloud providers (Google cloud platform, AWS, Microsoft Azure provides GPU based computing services) Some of these services are free for some extend and may have to pay if you going to run your training for a long time.

One of the main data science platforms, Kaggle provides free access for GPUs with a limit (https://www.kaggle.com/dansbecker/running-kaggle-kernels-with-a-gpu) Just take a look there and it may fit your need too.

So, that’s it! Don’t wait till the CPUs do the computation. Let the GPU train your model.

Democratizing Machine Learning with Cloud

HiRes.jpg.800x600_q96We have already passed the era of gigabytes when it comes to data. World is talking about terabytes of unstructured data and massive amounts of data points generated from IoT devices and sensors in millions per a second. To analyze these heaps of data, obviously, we need large computation power and massive storage. Building workhorse machines to fulfil those tremendous workloads would definitely cost a lot. Cloud computing paradigm comes handy here. The resourcefulness and the scalability of the public cloud can be used to perform the large calculations in machine learning algorithms.

Almost all the major public cloud providers in the market comes up with machine learning services. Cloud machine learning services in Google Cloud Platform provides modern machine learning services, with pre-trained models and a service to generate your own tailored models. Amazon Machine Learning is a service that makes it easy for developers of all skill levels to use machine learning technology. IBM analytics comes up with a machine learning platform with its cloud data services. Azure Machine Learning Studio is a GUI-based integrated development environment for constructing and operationalizing Machine Learning workflow on Azure. We discussed a lot about Azure Machine Learning and its appliances in practical scenarios in the previous posts.

All the mentioned platforms provide machine learning as a service. Most of the platforms offer pre-built ML algorithms in packages. Simple drag and drop user interactions and easy deployment has attracted many developers to use these tools.

But, how would it be if you want to go from the scratch? Either you want to use the power of Graphical Processing Units (GPUs) to process the ML algorithms parallelly? Cloud based Virtual Machines specifically optimized for computation is one of the best solutions that you can consume.

Azure Data Science Virtual Machine (DSVM) –

dsvm

DSVM in Azure Portal

If you already have used Azure virtual machines for your computation, hosting or storage tasks, this would not be a new concept for you. Azure DSVM is specifically optimized for large computations. Azure DSVM comes in two flavors. One with Windows and the other with Linux. You can choose the hardware configurations as you wish. Many development environments, programming IDEs, languages are pre-installed in the VM instances.

dsvm_linuxMy personal favorite here is the Linux DSVM instance. Here I’ve created a Linux DSVM with the basic configurations. For accessing the VM you can use any tool that can do a SSH call. What I normally do is calling the accessing the VM using Ubuntu Bash on Windows 10.

GPUs for machine learning –

GPU_1

GPU_2

Configurations of the Linux VM with Nvidia GPU

Many machine learning algorithms currently available can be executed parallely. Execution parts of those algorithms are embarrassingly parallel. With that parallel programming, you can reduce the execution time of the algorithms drastically. Data scientists in both industry and academia have been using GPUs for machine learning to make groundbreaking improvements across a variety of applications including image classification, video analytics, speech recognition and natural language processing.

google_brain

GPUs Vs. CPU computing

Specially in Deep Learning, parallel processing using GPUs can make a drastic decrease in computation time. Purchasing a deep learning dream machine powered with a CUDA enabled high-end GPU such as Nvidia Tesla K80 would cost nearly 6000 dollars! Rather than spending a lot on a machine like that, the most feasible plan is to provision a virtual machine with the specifications we need and pay as we consume.

VM_size

VM instance price plans

The N-series is a family of Azure Virtual Machines with GPU capabilities that you can use for these kinds of tasks. The N-series will feature the NVIDIA Tesla accelerated platform as well as NVIDIA GRID 2.0 technology, providing the highest-end graphics support available in the cloud today. Through your Azure portal, you can choose a desired price plan with the desired configurations for your tasks when provisioning the VM.

teslaHere’s my Azure VM specifically configured for deep learning exercises. The machine is powered with Tesla K80 GPU which is having 4992 cores in it!! I installed anaconda for that and doing computations using Jupyter notebooks.

Just a hint: stop your VM instance when you are not using it for computation to avoid getting huge unnecessary bills. 😉

No need of huge wallets! The wise decision would be applying cloud technologies for machine learning.