Today, enterprise networks are being scanned constantly for vulnerabilities. This could happen either for benign or malicious reasons, meaning that the scanner could be looking for vulnerabilities to fix or to attack. In this project, we want to take advantage of external scanning efforts, and according to it score the security status of the organization, using features like direction, quantity of data sent/received, location in the company, relations of the scanned machines with other machines inside the company, etc.
Supervised model to detect malicious network activity, Checkpoint.
Given network security data from all around the world, tagged as: Benign, Malicious or Unknown, with a certain level of confidence. The “unknown” labels are mostly on encrypted data. The project will be to detect malicious activity while using clustering on the unknown data to get a verdict for it. The model will have features like: – Traffic (number of bytes) – Reputation (if available) – Port – Sequences over time – Etc.
- Understanding how does Checkpoint work – Understand the architecture, software used, features and labels in Cybersecurity.
- Data collection – Understand what features to use from the original dataset and collect them accordingly.
- Data exploration – See how the observations behave. Data exploration showed dirty data such as missing features.
- Categorical metadata – Most of the original observations were categorical, and transforming them via pd.get_dummies created many features which made it difficult to work with.
- Work with unsupervised learning model, and collaborate with cybersecurity expert in order to evaluate the classification.
- Push the algorithm in production.
Achievements (according to KPIs)
- Creation of a Clustering Model (unsupervised).
- Definition of the optimal number of cluster for the different clients.
- Push the algorithm in production with automatisation.
- Try new clustering model and compare the result.