The goal of the project was to create a tool that would help the sales engineering team increase their sales by better explaining and clarifying the product’s value. The sales engineering team presents to potential customers simulations showing them anomalies that could have been detected if they had used the system. In order to bring better insights, higher value and to select important features, we developed a tool that learns an explanation of the anomalous state, using tree learning and C-tree, decision tree algorithm.
- Using tree learning algorithms in R combined in Python environment.
- Post processing of the tree with recursive algorithms.
- Create a working Rest-API.
- Adapt to constraints of real time running performance.
Achievements (according to KPIs)
1. Using tree learning algorithms in R combine in Python environment
All the pipeline was designed to be in Python, after a thorough examination of different decision tree models, it was decided that the most suitable algorithm is C-tree in R, Recursive partitioning for continuous, ordered, nominal and multivariate response variables in a conditional inference framework.
Since there is not an equivalent alternative in Python, it was decided to use a library that creates an interface between Python and R, rpy2.
2. Post processing of the tree with recursive algorithms
The next challenge was to post process the tree, and in fact, to produce a more informative tree, that includes more attributes, such as anomalous rate for each node and the combination of the features and their possible values of the entire route to each node. The post processing of the tree includes several recursive binary tree algorithms.
3. Create a working Rest-API
At this step we added filters, enabling the user to accurately search and get a tree that matches variable parameters. In order to create a working Rest-API, we used Flask, a micro web framework written in Python.
4. Adapt to constraints of real time running performance.
At this point, the challenge was to reduce execution time, as we want the tree to be visually received in real time based on the various filters. Since the script includes query from the database, running the model and more functions, it was challenging to improve execution times. We finally managed to reduce execution times from 20s to 2.5s.
- There’s still space for improvements in the execution time.
- The next step is to connect the working API to UI to get the tree visualization.
The Anodot team is very satisfied with Dana’s work. We reached our main objective: a working product for our internal team. This is due to the continuous efforts of Dana. Additionally, Dana demonstrated lots of curiosity for different fields of data science and for a wide range of technologies. She also has been well integrated with the team. I am convinced she will be a very good data scientist in the future.