Project by Ariela Strimling
Data Science Fellows June 2020 Cohort
The objective of the project was to automatically detect outliers for all the data of Weissbeerger. The investigation was made on item level and by grouping on daily sales. AR model and Hampel filter models were used for the seasonal data, data that was grouped by day. To detect outliers on an item level, it was crucial to understand that the data doesn’t follow a normal, gamma or other distribution; so internal knowledge on the company was needed and general understanding on how a bar operates. It was decided to only target true outliers and avoid as much as possible false positives. Value counts was the most helpful tool for this procedure because prices in a restaurant are usually set and cannot be very different from the rest.
Challenges (at least two)
- Creating one solution that fits all items. There were different items provided that could be as tagged as “general”, “kitchen general” or as specific as “Corona 300ml”. It was difficult to make an algorithm that could detect outliers for both. This also happened with seasonality bars, some had it but others didn’t.
- Communication though zoom, was difficult. Understanding the company’s needs in so little time and adjusting schedules to make as much as possible.
Achievements (according to KPIs)
- A second algorithm for seasonal data (AR model) was implemented. This algorithm avoided false positives for seasonal data and caught the trend in bars.
- An algorithm on item level was implemented. This algorithm allows discounted items (which was crucial for the company) and finds items that seem like a mistake. (Items that were charged in the name of another, single items that were overpriced.)
- The item qty in orders gained attention by a third algorithm that penalizes strange qty in items. It catches item qty that were set like price instead of item qty, very large orders that are not usual in a bar.
Future project development
The outlier detection should be made in all of the data and a validation on all of it. The findings will help build better models to understand bar behavior and to alert clients of the discoveries of these algorithms. The algorithms can be used separately, it is necessary to put them together to gain other insights.