fbpx

Developing a system for sample verification

Project by Remy

Abstract

The purpose of the project is to build a visualization tool for the data science team. Indeed, this dashboard allows users to access several information, such as the database used for the different models, the results, the sample’s details and everything related to the data used in the algorithm implementation. Development of a service that allows a deep analysis on FP (false positive), The purpose of the project is to create a server that displays sample information (id, features used for the training phase, sample source …) in order to clusterize the FP in order to improve predictions and sample analysis.

 

Challenges

  • Understanding basis. The first challenge was to understand the data we are using in order to get from them something useful
  • Imagine what could be useful for the users : graphs, boards …
  • Developing front end skills. Indeed, I built a dashboard, thus I had to create a server that could be used by everyone in the data science team, back end skills were not enough for that
  • Add some intelligence to the tool in order make it for useful : Clustering False Positives
  • Improve coding skills. I have learnt to write ‘beautiful code’ by adding some ‘unittests’ and documentation in order to get a code ready for the production

 

Achievements

  • Build a simple dashboard already used by the team
  • Able to improve models predictions by analyzing sample per sample throughout the new server
  • Detecting some anomalies thanks to this server

 

Further development

Developing the clustering part that could be very useful for the sample verification. Indeed, CHECKPOINT tries to reduce the amount of False Positive (sample classified as malicious whereas they are not). The clustering part of the project will lead to get different group of False Positive that will allow us to determine the reasons why we misclassified these samples and then try to improve our models

Share this post

Share on facebook
Share on twitter
Share on linkedin
Share on email