Data Science Fellows Projects 2019

Anomaly detection using NLP methods, and dynamic features

Project by Arie

Abstract

With file less attacks, where no executable or file is involved, it’s very hard to get a ground truth to determine if that attack was actually a FP or a TP. Most of the file less attacks come from PowerShell. The project deals with Anomaly detection using NLP methods, and dynamic features. The predictions are shown to the customers and are also used to evaluate future maintenance sessions. The goal here was to analyze PowerShell script, extract features from it, and determine whether the way this script is written, is Malicious or Benign. This was an end to end data science project, including collecting the data, (re)labelize it, and to write multiple classes for each step project: Data Preprocessing, Features Extraction, the Model itself, and then finally, to write a flow to automate all the process.

 

Challenges

  • Understanding PowerShell, how it works, and why obfuscation is easy using PowerShell script
  • There is many ways to write PowerShell parameters, which were at the end most of my feature, so it was complicate to extract all of them properly. Each parameter has a different structure, some need values, some don’t, so again, extracting the parameter was a big challenge.
  • Extracting script from the Database was a very long and complicated process.
  • Most of my entries were mislabeled, therefore I needed to write a code in order to relabel every entry, which was also long and challenging.
  • Write all the flow, from receiving DB of scripts, pre-process the script, extract the features and classify the model.
  • Presenting the model – Presenting the result in front of the supervisor. Showing a dashboard of all the result presented.

 

Achievements (according to KPIs)

  • Recall 25%
  • False Alarm Rate 1.5%

 

Further development

  • Monitoring the model over a month in production – checking to see how the model behaves in real time, deciding based on the degradation of the model when to re-train it and if other features should be added.
  • Increasing recall by possibly  adding new features. Indeed, my POC was to only focus of PowerShell parameter. There is a lot of new potential features to write on the Script of the Command.

Share this post

Share on facebook
Share on twitter
Share on linkedin
Share on email