Going to break the flow of Azure Machine Learning blog series and write bit of a descriptive answer for a frequently asked question.
Here’s that golden question!
What is the development rig and tools you use for deep learning/machine learning development?
We all know doing machine learning experiments comes with its cost of computation power. When it comes to deep learning, it’s very computationally expensive. As a researcher who do most of my experiments on computer vision and deep learning this is the setup I use to my jobs done (As of September 2020).
Please note that, all the things am mentioning here are my personal preferences and not any standard or something. Neither any of the manufacturers or software vendors have sponsored me for this post 😀 (Most of the software tools I’m mentioning here are FOSS).
Choosing the right set of hardware, is the most vital part of building a DL workstation. It maybe quite expensive to go for the best set which satisfy your need but I see that as an investment. Make sure you don’t spend way too much just for the name sake or the brand. Make sure whether it’s enough to do your job.
The main specifications you need to look on a DL workstation are processor, RAM, Storage and GPU processing unit (If you wanna do computer vision, NLP kinda experiments this is a must!) . I’m using a desktop with a Intel Xeon 4 core processor with a 16GB RAM. When it comes to storage I’m having a SSD as the primary drive installed all my software and a bit big (6TB) hard drive. Why such a big storage? Since I have to deal with somewhat big image and video datasets having a big storage is a need.
GPU! This is the pinnacle of your build! Plus maybe the most expensive. I have a NVIDIA GTX 1080 which is having 8GB GPU memory space and supports CUDA based processing. Comes handy with most of the experiments (When this is not enough, I use a remote cluster with 2 NVIDIA 2080Ti s for big computations.)
Make sue you are having proper cooling for the machine! Believe me else these computations gonna generate a lot of heat and ruin your hardware.
Laptop or a desktop?
This is a hard choice to make. If you have to go place to place to do your job then you may have to choose a laptop. (Many gaming laptops comes with GPUs which can be used for CUDA based processing). I’m more of a desktop person and you can easily build a more powerful rig for the same amount of dollars you spend for a good enough laptop. (I don’t prefer Macbooks since they are not coming with GPUs.. I just use a light laptop to do my presentations and just to carry around for meetings)
With all these beasts, a two-monitor setup, mechanical keyboard with a wireless mouse, RODE USB mic for video cons are records and a Bose sound cancelling headset for a complete isolation is just there to ease up the things.
Software and Tools
This is the most interesting part. Though you have the correct hardware, the wrong choice of tools would make your job hard. (Again this is completely my choice of software. You may have different preferences)
I use Ubuntu Linux as my operating system (18.10 LTS still 😀 ) . Why Ubuntu? Since it’s so easy to setup and work with, that’s my choice. Since Python is my primary programming language it works as magic with all the python environmental dependencies and all. (Plus OpenCV 😀 )
So I said Python.. what else? Yeah.. along with most of the machine learning frameworks (yes I have an Anaconda based setup) PyTorch has become my to-go deep learning framework. Easy debugging, Pythonic syntax, wide support made me to take this choice.
No programmer is complete without his/her own IDE choice and modifications. Microsoft VSCode (which is open source and free) is the IDE I use. It supports Python, Spark, Scala and almost all the languages I use. I added few extensions which comes so handy with experiments. Here are some of the extentions I use. See if they suits for you.
- Python – Make sure I get all the indentation and tooltips right.
- Remote explorer – Since I use a remote cluster to train my models and sometimes a NAS to store, this comes handy to manage my SSH connections. Pretty convenient even for remote debugging.
- Docker – pretty easy to manage your Docker environments.
- Excel viewer – Just to view the CSV files in style.
- LaTeX Workshop and Code Spell Checker – This is all because I use LaTeX for scientific writing. Believe me, VSCode is a nice place to do word processing too 😀
With all these tweaks I use the dark theme 😀
This is all about the hardware and software setup I use for my experiments. In a later post, will talk about some tips and tricks in MLOps I use within experiments.