Apply

Please fill out in English

First Name*

Last Name*

Email*

Choose Program*

Academic experience in:(Which of these: Probability & Statistics, Calculus, Linear Algebra or none)

Mobile (Type your number without dashes)*

Country of residence*

utm_campaign

I agree to receive information from Israel Tech ChallengeI agree to receive information from Israel Tech Challenge

First Name

Last Name

utm_campaign

Choose Program*

Preferred Specialization

Mobile (Type your number without dashes)*

Linkedin Address (URL)*

Country of origin*

Country of residence*

Academic Institution*

Academic Degree

Do you have programming knowledge?

How did you hear of US?*

utm_campaign

Implement a Factorization machine solution in TensorFlow for CTR prediction

Project by Ornella

November 10, 2019
, 10:02 am
, Fellows 2019

Abstract

Outbrain strives to serve the best possible content to its users. For that purpose, various techniques are leveraged. Factorization machines are a leading industry standard for the recommender system. The project goal was to Implement a Factorization machine solution in TensorFlow for CTR prediction and measure its accuracy and performance on Outbrain’s extensive big data.

Challenges

Working with huge amounts of data: Outbrain systems produce around 2M/3M impressions per hour. The data I worked on contains 14 features and after hashing, it contains 4.5M of features. Working with this size of data (2M by 4.5M) locally caused Python to run for many hours or even crash in times (due to memory issues).
Enabling the model to learn incrementally: retrain the model every hour with new data and predict the next hour.
Setting up and running a Pipeline on a remote machine in order to train the model every hour (incremental learning) on a whole week.
Integrating the model into Outbrain’s evaluation system.

Achievements (according to KPIs)

Delivered a working pipeline:

Read and transform the data from LIBSVM format to sparse matrix tensor
Train the model every hour and save the model
Retrain the model from the last model (incremental learning)
Predict on the next hour following the training
Evaluate metrics on predictions like RMSE, MRR, and AUC
Write the predictions and evaluations in files for every training

Considering the huge amount of data, I found a way to overcome the memory issues (one hour of data contains on average 2.9M of impressions) by using sparse matrices and optimize the time training/predicting by finding the optimal batch size.

Trained the model (every hour) on 8 days of data and predict on the 3 next days:

Training time: 54 seconds per hour of data
Predicting time: 11 seconds per hour of data

Head to Head comparison of the results to the model in production:

RMSE: improved by 13%
MRR: improved by 2%
AUC: improved by 12%

Further development

Train the model on more than one week (maybe one month).
Improve the model performances in terms of training time and MRR/AUC.
Fine-tuning the parameters of the model (number of epochs, batch size, learning rate, etc.).
Write the model in the production code.

Supervisor Feedback

Ornella worked as part of our CTR prediction team. She took an active part in exploring TensorFlow for that purpose by integrating TF into existing evaluation mechanism. Ornella worked diligently and demonstrated high professional and interpersonal skills. Results on her short internship were more than very satisfactory, the TF package worked with Outbrain huge dataset and plugged in successfully to the evaluation mechanisms. Due to that we offered Ornella to extend her Outbrain internship in 5 more months.

– Assaf Klein – Perso/NLP lead

Please fill out in English

Implement a Factorization machine solution in TensorFlow for CTR prediction

Project by Ornella

Share this post

See more projects

Predicting and Alerting Maternal Emotional States during Pregnancy, Nuvo Cares

Feature engineering for the current Out of stock detection ML model, Trax Retail (Retail Watch team)

Points of Consumption Like You (PLU), WeissBeerger

Dataset2Vec, Explorium