The project goal was to reduce “Unknown” verdict from the ground truth by extrapolating features from forensic reports with known verdicts. then we can use this trained model on the unknown incidents and get a more confident verdict. the project was end to end data science work, it included testing the model with real time data by creating an object that can predict and re training automatically in production. The aim of the project is reducing verdicts that were classified as unknown by creating a model from end to end. the model will give a more confident answer of whether the incident should be classified as malicious or benign. The implemented feature were engineered from forensic report that describes the behavior of the incident.
- Understanding the forensic reports, getting acquaintanced with terms from the cyber security world and understanding what features can help the model
- Each incident has several json files that store important data in a tree structure, a scraping class was developed to go over each child of each incident and save the relevant information.
- Feature extraction – a lot of manipulation was done to the data including transforming categorical features to numerical, dropping sparse categories and normalizing.
- Contradicted incidents – some incidents in the data set had the same labels and different verdict. there was a need to understand who they are and if it is necessary to keep them, change or remove depends on the specific case.
- Tuning parameters – precision is extremely important in this case. deciding on the best model type for the case with the best parameters and the optimum thresholds (both for benign and malicious) was a big part of the project.
- Adjusting the model for production – creating class that will scrape information from specific file and return a prediction, do automatic re-training with parameters and thresholds tuning.
- Presenting the model – creating a bokeh dashboard that will show the results of the model in terms of business intelligence like important metrics, feature importance and how many were we able to classify on real unknown verdicts. the dashboard also need to show what happens when we play with the thresholds
Achievements (according to KPIs)
- Recall 87%
- Precision 99%
- Classified Unknowns: 60%
- Monitoring the model over a month in production – checking to see how the model behaves in real time, deciding based on the degradation of the model when to re-train it and if other features should be added.
- Increasing precision by adding possibly new features
- reducing contradictions between similar features and different labels, could be done by adding more features that are possibly not binary.