Develop one of the “corner stone” for our Meal Flow metric, transforming every check to a meal, and measure the flow of consumer through the meal (e.g. start with beer, move to burger, move to soda, finish with ice-cream).
The main indicators are the check open and close time, which appears only in 20% of the checks (not all POS solutions provide that information).
The goal is to develop a machine learning based solution to infer from the 20% checks and predict the dwell time of the other 80%.
- Provide detailed feature exploration report.
- Provide methodology to pre-process and clean the data from outliers/anomalies.
- Develop pre-processing stage in SPARK.
- Create features thorough feature engineering process. If needed, demonstrate feature reduction / selection.
- Model selection and training using Scikit learn or other common libraries.
- Provide model evaluation report.
- Move to production using AWS Sagemaker infrastructure.
Achievements (according to KPIs)
- Delivered an univariate and multivariate analysis of the database.
- Developed 4 methodologies to detect anomalies for extreme values and abnormals. The approaches were Business Rules, Iterative Gaussian Outlier Detection, LOF and SVM for supervise model.
- Created features like servings with unsupervised clustering DBSCAN understanding if it’s a big serving or small serving.
- Applied different regressors like GLM, SVR, Polynomial, XGBoostRegressor, KNNRegressor, Gradient Boosting Regressor, and many more. All of them were evaluated by R2 and RMSE.
The Dwell Time is as a target variable behaves as Gamma Distribution and there is a very huge sparsity in the data that means that many identical orders can have different dwell time. The results of the project added value to the company in the Anomalies Detection and Feature Exploration, so the mentors decided to go deeper in those fields and not moving to production. In addition, the company gave the opportunity to give a training on this topics.
As for further work, there are 3 things to continue working:
- Go inside the sequences of the order items to work with time series in order to predict the dwell time
- Try with other approaches like Factorization Machine Regression that deal with sparsity
- After that moving to production with Spark and Sagemaker.