Data Science Fellows June 2020 Cohort
The project involved using mining time-series data and analyzing it to find small motifs or anomalous signatures. To this end, we used common anomaly detection tools such as Isolation Forest, Extended Isolation Forest, Eamonn Keogh’s Matrix Profile, and Auto Encoder neural network. We created a graphical UI to allow a broad range of company employees to use it for exploring our findings.
Challenges (at least two)
- When we tried to do the Matrix profile calculation on sectorial data (~2M data points), it was too heavy and we had to search for a way to do it with GPU (which also took long hours).
- Data preprocessing took longer than we estimated, the company doesn’t have a repository with basic info that can be extracted from their system, such as lists of points by sector or by a water authority.
Achievements (according to KPIs)
- We found motifs on the point and sectoral levels and created the files that find these motifs for further use by the R&D team
- We found anomalies on the point levels according to the 4 planned models, and we did the comparison between them.
- We created an interactive map with the relevant data for non-developers to use.
Future project development
The company aims to correlate our findings with pollution or maintenance-related behavior patterns in further research.