Classify inappropriate presentations made by users

Project by Matthias

Abstract
Powtoon is a world-leading video and presentation creation platform. Create unique and beautiful videos and presentations easily with Powtoon. Over 25 million people use Powtoon to add a spark of awesomeness to their internal and external communications every day. Powtoon came into this universe in 2012, and the world of communications has never been the same. Used by 96% of Fortune 500 companies, Ivy League universities, leading SMBs, agencies, and entrepreneurs, Powtoon empowers everyone to create professional-looking videos and presentations with endless versatility and an intuitive drag-and-drop interface. With over 25 million users globally, a new Powtoon video is created every second of the day.

Challenges

• All environments set up and running.
• Feature extraction and exploration on images and texts.
• Building the classification model.
• Data and model testing.
• Wrapping the model with API.
• API deploying and testing

Achievements (according to KPIs)
• Features Extractions.
• Features engineering.
• Model to predict abuse Powtoon.

Further development

The company Powtoon want to classify inappropriate presentations make by users called “powtoon”. The method deployed is to seek inside texts and images inserted, relevant features which may underlay an inappropriate use.

From texts I extract:

• Number of words.
• Keywords via the Text Rank algorithm.
• Entities via the SPACY package methods.
• Topics via Latent Dirichlet Analysis algorithm.

I extract from images the objects via pre-trained neural network (VGG16, YOLOv3 and XCEPTION):

• All kind of objects recognized by VGG16 and YOLOv3.
• Presence or not of guns in the image.
• Percentage of exposed skin.
• Emotions on faces.

The neural network YOLOv3 has the ability to make boundaries boxes around person, which permits to crop the image and apply on it exposed skin computation and the emotions recognition.
Through those features, others can be engineered. In fact, I transform the list of keywords to a vector and calculate their weighted norm and angle to underlay their similarities. Moreover, because of the high dimension of those NLP vectors, I apply PCA to reduce it.
After gathering features, I set a RandomForest model to classify inappropriate presentations. I use a train and test sets given and I play with features to improve the results.
The most important part of this project is to extract metadata from texts and images to build features and so classify the powtoons.

Supervisor Feedback

This project was handled in a professional way by Matthias who understood the concept in the very first day and delivered complicated solution that projected exactly what the business request demanded. For an ensemble model that required several technologies and complicated function to clean/map the data – I can say that I’m very pleased about the outcome, the knowledge and self-learning that Matthias did.
– Maor Nativ – Head of Data & Analytics @Powtoon

Share this post

Share on facebook
Share on twitter
Share on linkedin
Share on email