Training and evaluating deep learning models may take a lot of time. Sometimes it’s worth to monitor how good or bad the model is training in real-time. It’ll help to understand, debug and optimize your models without waiting till the model get trained to monitor the performance.The good old method of printing out training losses / accuracy for each epoch is a good idea, but it’s bit hard to evaluate the metrics comparatively with that.
A real-time graphical interface that can use to plot/ visualize metrics while a model is training through epochs or iterations would be the best option. Tensorboard is visualization tool came out with TensorFlow and I’m pretty sure almost all TF guys are using and getting the advantage from that cool tool.
So what about PyTorchians?? Don’t panic. Official PyTorch repository recently came up with Tensorboard utility on PyTorch 1.1.0 . Still the code is experimental and for me it was not working well for me.
Then, I found this awesome opensource project, tensorboardX. Pretty similar to what PyTorch official repo is having and easy to work with. TensorboardX supports scalar, image, figure, histogram, audio, text, graph, onnx_graph, embedding, pr_curve and video summaries.
5 simple steps…
- Install tensorboardX
- Import tensorboardX for your PyTorch code
- Create a SummaryWriter object
- Define SummaryWriter
- Use it!
I just did a simple demo on this by adding Tensorboard logs for the famous PyTorch transfer learning tutorial. Here’s the GiHub repo. Just clone and play around it.
Note that in the experiment I’ve used two SummaryWriter objects two create two scalar graphs for training phase and the other one for validation phase.
The log files will be created in the directory you specified when creating SummaryWriter object. (You can change this directory to wherever you want)
To view the tensorboard, open a terminal inside the experiment folder. Assume that your log files are inside ‘./logs/’ . Use the following command to spin up the tensorboard server on your local machine.
$ tensorboard –logdir ./logs/
Sometimes you may use a remote server or a VM (might be a Azure DLVM) for training your deep learning models. Then how to get this tensorboard out from there??
SSH Tunneling with post forwarding is a good option you can use for this. You just have to spin up the tensorboard service on your remote machine. Then tunnel the server back to your workstation with the ssh command stated below.
$ ssh -N -L 6007:127.0.0.1:6006 <username>@<remote_ip>
127.0.0.1:6006 : Tensorboard server running on the remote server / VM
6007 : local workstation port
You can then view the tensorboard running on the remote machine through your local machine’s browser.
That’s it! Simple and neat. No need to wait couple of days till the model get trained. Just monitor and stop early if it’s not learning well.
Enjoy Deep Learning!