HP indigo’s main product are press machines of various types. This project is aimed at providing a “submodule” that given a press id can be queried to display the predicted print volume for the time of the query. The goal is to achieve better results than the Naïve model currently used.
- Lack of Recourses – for the majority to the project I did not have a good access to the data due to prior organization issues, this was the main reason for slow progress.
- No one to ask – unfortunately I was the only person with any data science knowledge at the unit, also I had no access to any domain experts regrading the machines themselves, thus I mostly operated based on my own intuition. Towards the end of the project I found out that there was another unit in HP which deals with data science, but since they were extremely busy I could only briefly consult with them. This showed me that I should have focused my efforts only on certain machines, and that I should have used different data than what I was given.
- Creating a general solution – this problem was made quite difficult due to the fact that it was not a “one solution model” since I was to create a model that could work for thousands of different machines and produce individual results and so the classical approach of train evaluate and tune the model proved difficult to achieve for thousands of machines.
- Data sparsity – the data that I did manage to get was quite sparse, e.g. for a machine most of the entries were zeros. This rendered most of the classical models useless.
Achievements (according to KPIs)
- Discovery of the Amazon SageMaker platform with their newly released DeepAr algorithm, which seems to be exactly what was needed. It provides a sort of “black box” where all the Data can be fed, and it creates a general model that seems to fare better than the Naïve model. Also, it can create an endpoint which can be queried for the results.
At the moment the interaction with the SageMaker platform is cumbersome due to data transfer.
Issues between hp and the storage system of amazon (S3) which is required for the proper use of the model.
Better understanding of this platform should be further explored but it seems it can be fully integrated into production environment when fully understood.