Apply

Please fill out in English

First Name*

Last Name*

Email*

Choose Program*

Academic experience in:(Which of these: Probability & Statistics, Calculus, Linear Algebra or none)

Mobile (Type your number without dashes)*

Country of residence*

utm_campaign

I agree to receive information from Israel Tech ChallengeI agree to receive information from Israel Tech Challenge

First Name

Last Name

utm_campaign

Choose Program*

Preferred Specialization

Mobile (Type your number without dashes)*

Linkedin Address (URL)*

Country of origin*

Country of residence*

Academic Institution*

Academic Degree

Do you have programming knowledge?

How did you hear of US?*

utm_campaign

Fraud detection using behavioral biometrics data

Project by Shai

April 11, 2019
, 12:10 pm
, Fellows 2018

Abstract

The goal of the project is to detect fraud incidents, using two main data types: device identification information and behavioral biometrics data. The project consists of an end-to-end fraud detection problem (exploration, extensive features extraction, modeling, evaluation and monitoring). The project also includes time-series modeling using LSTM and GRU to create new features to the main model based on the behavioral data

Challenges

Imbalanced data – The original data has a ratio of 100:1 benign to fraud. After the original downsampling, the data is not balanced hence it requires different techniques to handle this as well as evaluating on an upsampled representation of the predictions
The data-sets are very different, therefore requires different approaches (that reflected in 2 different models – a time series model and a tree-based model). The output of the time-series model has to be an input to the final model
Data exploration – The data contained dozens of raw features which constitute the building blocks of the final data to be consumed by the models. Good understanding of the problem is vital for the features generation step
Features generation:
1. Behavioral data – raw features could not be used at all, hence generation of dozens new features is required. These new features have to make sense and represent behaviors that may be good for fraud detection
2. Device identification data – Generation of more than a hundred new features. The new created features were proven to be fraud or benign related
3. Examining the distribution of all features in terms of fraud and benign activities
Large scale on a private laptop – The size of some of the files is more than a few GBs, therefore not all files could be loaded into memory at a single operation
Model Hyperparameter tuning – Some of the hyperparameters tuning have to be done on an external server due to required resources needed for this operation

Achievements (according to KPIs)

Extensive data analysis notebooks (end to end):
1. Identified fraud related features (numeric and categorical features) during the data exploration
2. Developed automatic metrics calculations and plots for evaluation of the models (matplotlib and bokeh plots)
Achieving the highest recall for 0.001 FPR – The Recall up until now has been around 0.4. In this project we achieved 0.5 Recall for the required FPR
Comparing multiple modeling and architectures – We used LSTM /GRU RNN model on the behavioral data (represented as time-series) and Xgboost as our final model on all data. The results of the time-series modeling was plugged into the final modeling as another feature. Many hyperparameters and features were evaluated during the model’s testings in order to find the best architecture and generated features

Further development

Evaluating the model’s performance in a larger scale without downsampling. The benign data was downsampled prior to the model and upsampled afterwards (only predictions and labels) to evaluate the model’s performance in “real world” conditions
Monitoring the model’s performance over a few months
Adding more features to the behavioral time-series data and testing more architectures

Please fill out in English

Fraud detection using behavioral biometrics data

Project by Shai

Share this post

See more projects

Predicting and Alerting Maternal Emotional States during Pregnancy, Nuvo Cares

Feature engineering for the current Out of stock detection ML model, Trax Retail (Retail Watch team)

Points of Consumption Like You (PLU), WeissBeerger

Dataset2Vec, Explorium