Abstract
The project’s goal was to deliver an E2E CTR prediction model for personalized recommendations. The project involved the fusion of different consumption features, taking into account their corresponding confidences & priors, for devising a click prediction model that can be used in Outbrain’s low-latency, high-throughput serving layer. The models’ quality was measured on a predefined test set using standard supervised machine learning evaluation metrics of model performance. The model’s performance was then checked via A/B testing.
Challenges
- Understanding how does Outbrain work – Understand the architecture, software used, features and labels.
- Data collection- understanding what features to use from the original dataset and collect them accordingly. For example: collecting only Exploitation stage data.
- Data exploration- see how the observations behave. data exploration showed dirty data such as missing features and in times ctr>1.
- Categorical metadata- most of the original observations were categorical, and transforming them via Get_dummies created thousands of features which made it difficult to work with.
- Overfitting – the datasets behaved differently between themselves which caused certain aspects of the model to over fit.
- Working with data outside the dataset- adding additional data from different tables.
- Data quantity- the amount of data is huge and working with it locally caused the Jupyter to run for many hours or even crash in times.
- Scaling model to bigger dataset- because the model ran locally, it required to work with smaller datasets than needed (hours instead of days).
- Deployment difficulties- the transition from working locally to A/B testing required compromising on model type, required features, etc.
Achievements (according to KPIs)
- CTR prediction model
- reasonable metrics:
– average R2 score: 0.5
– average RMSE: 0.009
Further development
A/B testing- the A/B test started on the 19/03/19 afternoon with a simple linear model (Ridge Regression) and minimal amount of features (signals only). The test kept a stable behavior of a 1% drop compared to the control groups over the weekend. The next steps will be to add additional features and a more complex model.
If a lift will be seen, then the next development steps will be to automate the code so that the weights and the means will be fed to the model automatically.