Research and implementation of super-convergence methodologies at production level to achieve a reduction in training time while maintaining benchmark success metrics of deep learning models for computer vision.
A modified learning rate policy known as One Cycle was implemented as a Keras callback to change the learning rate as required by One Cycle, starting from a minimum learning rate which increases to the maximum learning rate until 50% of total training iterations, after which learning rate decreases to minimum learning rate. In last 10% of training iterations, learning rate is annihilated, reducing to 1/100 of its minimum learning rate value.
Minimum and maximum learning rate chosen for One Cycle method is selected using the LR Range Test as suggested by the author which runs 1 epoch on the architecture – dataset combination where the learning rate is increased each iteration and it’s loss recorded. Learning rates and their corresponding losses are plotted and the maximum learning rate is chosen by observing where the loss is as low and stable as possible. The minimum learning rate is taken at 1/10 of this value as suggested by the author.
Philosophy of One Cycle is based on curriculum learning and simulated annealing. Theoretically One Cycle allows SGD to traverse loss function topology at a slow but increasing rate in 1st part, in order to fully explore the topology until it reaches a valley with more optimal local minima, in the second part of One Cycle the learning rate begins to decrease again allowing SGD to slowly explore this valley to find the optimal local minimum.
- Working with Google Collab to establish baseline by reproducing paper results which used a different library (namely Caffe)
- Google Collab has a 12 hr window after which all data is deleted – this had to be taken into account when planning training.
- Parameters tweaked by author in Caffe had to be adapted to Keras/Tensorflow library.
- Implementing super convergence methodology within existing company code base/infrastructure.
- Had to quickly understand existing infrastructure and how newly created callback could be included within it.
- Had to reduce friction within existing pipeline to ensure super convergence methodologies could be used efficiently.
Achievements (according to KPIs)
- Original paper results reproduced.
- Implemented LR Range Test and One Cycle callbacks at production level.
- Applying One Cycle (super-convergence methodology) to custom architecture and small representative dataset resulted in a validation accuracy within 3% of baseline in a quarter of the number of epochs.
- One Cycle allowed for superconvergence of custom architecture and small dataset but will need to be tested on larger datasets to confirm it’s utility for the company’s’ pipeline.
- Automate selection of learning rate in LR Range Test. Would require signal processing.
- Automate process of selecting weight decay and batch normalization momentum values for the given architecture to optimize super convergence (fewer epochs with highest validation accuracy possible) for a given dataset.