Docker has become the new norm of the software industry. Everyone is so obsessed with it since docker solves most of the issues software engineers and system administrators had with platform dependencies in application development and deployments.
“Docker is a tool that helps users to exploit operating-system-level virtualization to develop and deliver software in packages called containers.”~ Wikipedia
Though the technical explanation sounds bit complicated, simply docker can be identified as a ‘VM like’ environment where you can build and deploy your software applications.
Why docker for machine learning/ deep learning?
We have endless discussions on how hard it is to configure the development and deployment environments in machine learning. Since python is the most used language for ML and DL experiments, dealing with python packages and making them all work seamlessly on your hardware can be a nightmare. Using cloud-based machine learning platforms or virtual machines are some of the options we can utilize to deal with this problem.
Being more flexible than virtual machines and easy migration capabilities, docker is one of the best ways for managing machine learning environments. Since docker has become the key component of MLOps it’s time for the data scientists for adapting docker in their developments.
Where and how we can use docker?
For me docker helps me out in 4 main stages in the machine learning experiment pipelines.
- As a development environment.
I use to do lot of experiments in the domain of computer vision and deep learning. You may have experienced the pain of dealing libraries like opencv with python. So, I always use custom docker images with all the dependencies installed for running my experiments. This makes easy for me to collaborate with my peers easily without giving the hassle of replicating my development environment in their machines.
What about the huge amounts of data? Including those also inside the docker container? Nah. Always keeping the data in mounted volumes as well as the output files created from the experiments.
If you need GPU supported docker images, NVIDIA provides docker image variations that matches with your need on docker hub.
2. As a training environment.
You all know ML/ DL models normally take quite a big time for training. In my case, I use remote shared servers with GPUs for training my experiments. For that, the easiest way is containerizing the experiment and pushing to the server.
3. As a deployment environment.
Another popular use case of docker is in the deployment phase. Normally the deployment environment should fulfil required dependencies in order to inference the ML/DL model seamlessly. Since a docker container can be shipped across platforms easily without worrying about hardware level dependencies, it’s really easy to use docker for deploying ML models.
4. Docker for cloud-based machine learning
Most of the data scientists are using cloud-based machine learning platforms like Azure machine learning today with their flexibility and resources. Containerized experiments are the main component these services use in order to run them on cloud. When it comes to Azure ML you can use their default docker image for experiments or you can specify your custom base image for model development and training.
Take a look on this documentation for deploy Azure ML models using a custom docker base image.
So, docker has become a life saver for me since it reduces a lot of headache occurring with machine learning model life-cycle. Will come up with a sample experiment on using docker for training a machine learning model in the next post.