Apply

Please fill out in English

First Name*

Last Name*

Email*

Choose Program*

Academic experience in:(Which of these: Probability & Statistics, Calculus, Linear Algebra or none)

Mobile (Type your number without dashes)*

Country of residence*

utm_campaign

I agree to receive information from Israel Tech ChallengeI agree to receive information from Israel Tech Challenge

First Name

Last Name

utm_campaign

Choose Program*

Preferred Specialization

Mobile (Type your number without dashes)*

Linkedin Address (URL)*

Country of origin*

Country of residence*

Academic Institution*

Academic Degree

Do you have programming knowledge?

How did you hear of US?*

utm_campaign

Deep Reinforcement Learning Research for Real-Time Malware Prevention

Project by Jeremy

April 14, 2019
, 7:41 pm
, Fellows 2018

Abstract

Deep reinforcement learning is a family of algorithms allowing to train agents to make optimal decisions in dynamic, high dimensional environments. The underlying theory relies on Markovian Decision Processes. A policy function (Pi), or a future reward function (Q), are estimated via training one or more neural networks during multiple episodes with a degree of randomness in decisions.

In this project, we adapted 3 algorithms: DQN, Reinforce and Actor-Critic, to perform real-time malware prevention, both on a Deterministic and a Stochastic environment. The data provided was in the form of vectors of integers, each corresponding to an api call that was performed by a program.

Challenges

The data didn’t respect the Markovian process assumption, and we had to instead consider implementation tricks relative to Partially Observable Markovian Processes (POMDP), like stacking a history of states.
We had to define the action space of the agent as well as the reward function according to the needs of the project.
Make a sequence of actions without a natural feedback provided.

Achievements (according to KPIs)

Implemented all 3 algorithms.
Defined and Implemented visualization metrics allowing comparison and improvements of the models.

Further development

In the DQN algorithm, instead of a fully connected neural network, we suggest using a memory based neural network (LSTM) as was done by M. Hausknecht and P. Stone (https://arxiv.org/pdf/1507.06527.pdf)

Implement a variation of the Observation wrapper where only one vector contains the aggregated history of all API calls for a given file and another vector contains the last x apis.

Please fill out in English

Deep Reinforcement Learning Research for Real-Time Malware Prevention

Project by Jeremy

Share this post

See more projects

Predicting and Alerting Maternal Emotional States during Pregnancy, Nuvo Cares

Feature engineering for the current Out of stock detection ML model, Trax Retail (Retail Watch team)

Points of Consumption Like You (PLU), WeissBeerger

Dataset2Vec, Explorium