Apply

Please fill out in English

First Name*

Last Name*

Email*

Choose Program*

Academic experience in:(Which of these: Probability & Statistics, Calculus, Linear Algebra or none)

Mobile (Type your number without dashes)*

Country of residence*

utm_campaign

I agree to receive information from Israel Tech ChallengeI agree to receive information from Israel Tech Challenge

First Name

Last Name

utm_campaign

Choose Program*

Preferred Specialization

Mobile (Type your number without dashes)*

Linkedin Address (URL)*

Country of origin*

Country of residence*

Academic Institution*

Academic Degree

Do you have programming knowledge?

How did you hear of US?*

utm_campaign

Anomaly detection using NLP methods, and dynamic features

Project by Arie

April 11, 2019
, 1:21 pm
, Fellows 2018

Abstract

With file less attacks, where no executable or file is involved, it’s very hard to get a ground truth to determine if that attack was actually a FP or a TP. Most of the file less attacks come from PowerShell. The project deals with Anomaly detection using NLP methods, and dynamic features. The predictions are shown to the customers and are also used to evaluate future maintenance sessions. The goal here was to analyze PowerShell script, extract features from it, and determine whether the way this script is written, is Malicious or Benign. This was an end to end data science project, including collecting the data, (re)labelize it, and to write multiple classes for each step project: Data Preprocessing, Features Extraction, the Model itself, and then finally, to write a flow to automate all the process.

Challenges

Understanding PowerShell, how it works, and why obfuscation is easy using PowerShell script
There is many ways to write PowerShell parameters, which were at the end most of my feature, so it was complicate to extract all of them properly. Each parameter has a different structure, some need values, some don’t, so again, extracting the parameter was a big challenge.
Extracting script from the Database was a very long and complicated process.
Most of my entries were mislabeled, therefore I needed to write a code in order to relabel every entry, which was also long and challenging.
Write all the flow, from receiving DB of scripts, pre-process the script, extract the features and classify the model.
Presenting the model – Presenting the result in front of the supervisor. Showing a dashboard of all the result presented.

Achievements (according to KPIs)

Recall 25%
False Alarm Rate 1.5%

Further development

Monitoring the model over a month in production – checking to see how the model behaves in real time, deciding based on the degradation of the model when to re-train it and if other features should be added.
Increasing recall by possibly adding new features. Indeed, my POC was to only focus of PowerShell parameter. There is a lot of new potential features to write on the Script of the Command.

Please fill out in English

Anomaly detection using NLP methods, and dynamic features

Project by Arie

Share this post

See more projects

Predicting and Alerting Maternal Emotional States during Pregnancy, Nuvo Cares

Feature engineering for the current Out of stock detection ML model, Trax Retail (Retail Watch team)

Points of Consumption Like You (PLU), WeissBeerger

Dataset2Vec, Explorium