Apply

Please fill out in English

First Name*

Last Name*

Email*

Choose Program*

Academic experience in:(Which of these: Probability & Statistics, Calculus, Linear Algebra or none)

Mobile (Type your number without dashes)*

Country of residence*

utm_campaign

I agree to receive information from Israel Tech ChallengeI agree to receive information from Israel Tech Challenge

First Name

Last Name

utm_campaign

Choose Program*

Preferred Specialization

Mobile (Type your number without dashes)*

Linkedin Address (URL)*

Country of origin*

Country of residence*

Academic Institution*

Academic Degree

Do you have programming knowledge?

How did you hear of US?*

utm_campaign

Super-Convergence pipeline for deep learning models

Project by Eitan

April 14, 2019
, 10:03 am
, Fellows 2018

Abstract

Research and implementation of super-convergence methodologies at production level to achieve a reduction in training time while maintaining benchmark success metrics of deep learning models for computer vision.

A modified learning rate policy known as One Cycle was implemented as a Keras callback to change the learning rate as required by One Cycle, starting from a minimum learning rate which increases to the maximum learning rate until 50% of total training iterations, after which learning rate decreases to minimum learning rate. In last 10% of training iterations, learning rate is annihilated, reducing to 1/100 of its minimum learning rate value.

Minimum and maximum learning rate chosen for One Cycle method is selected using the LR Range Test as suggested by the author which runs 1 epoch on the architecture – dataset combination where the learning rate is increased each iteration and it’s loss recorded. Learning rates and their corresponding losses are plotted and the maximum learning rate is chosen by observing where the loss is as low and stable as possible. The minimum learning rate is taken at 1/10 of this value as suggested by the author.

Philosophy of One Cycle is based on curriculum learning and simulated annealing. Theoretically One Cycle allows SGD to traverse loss function topology at a slow but increasing rate in 1st part, in order to fully explore the topology until it reaches a valley with more optimal local minima, in the second part of One Cycle the learning rate begins to decrease again allowing SGD to slowly explore this valley to find the optimal local minimum.

Challenges

Working with Google Collab to establish baseline by reproducing paper results which used a different library (namely Caffe)
- Google Collab has a 12 hr window after which all data is deleted – this had to be taken into account when planning training.
- Parameters tweaked by author in Caffe had to be adapted to Keras/Tensorflow library.
Implementing super convergence methodology within existing company code base/infrastructure.
- Had to quickly understand existing infrastructure and how newly created callback could be included within it.
- Had to reduce friction within existing pipeline to ensure super convergence methodologies could be used efficiently.

Achievements (according to KPIs)

Original paper results reproduced.
Implemented LR Range Test and One Cycle callbacks at production level.
Applying One Cycle (super-convergence methodology) to custom architecture and small representative dataset resulted in a validation accuracy within 3% of baseline in a quarter of the number of epochs.

Further development

One Cycle allowed for superconvergence of custom architecture and small dataset but will need to be tested on larger datasets to confirm it’s utility for the company’s’ pipeline.
Automate selection of learning rate in LR Range Test. Would require signal processing.
Automate process of selecting weight decay and batch normalization momentum values for the given architecture to optimize super convergence (fewer epochs with highest validation accuracy possible) for a given dataset.

Please fill out in English

Super-Convergence pipeline for deep learning models

Project by Eitan

Share this post

See more projects

Predicting and Alerting Maternal Emotional States during Pregnancy, Nuvo Cares

Feature engineering for the current Out of stock detection ML model, Trax Retail (Retail Watch team)

Points of Consumption Like You (PLU), WeissBeerger

Dataset2Vec, Explorium