Build the groundwork for BI Analytics capabilities out of Big Data for both internal and client use, using clustering and NLP techniques

Project by Yuri


Proonto is a SaaS-powered customer interaction management solution connecting companies with a curated marketplace of customer interaction experts. Companies can plan and execute their live support and pre-sale interaction strategies, design customer touch points across their buying journey and flexibly manage sales & service teams of Proonto’s associated chat agents.

The project aimed at building the groundwork for BI Analytics capabilities out of Big Data for both internal and client use. Its ultimate goal was to answer (graphically) most questions out of a 82-question list, with topics encompassing User Profiling, Chat Service Analytics, Conversion Rate Optimization, Chat Topic Analysis, and Business Insights for Clients.

Development was roughly divided into: 

  • Data Exploration and Documentation, understanding database architecture, listing needed variables for addressing the questions posed and documenting existing code; 
  • Feature Engineering & Enrichment, building over 15 new features and enriching existing ones over more than 70 commits;
  • Building Visualizations and Dashboard, with over 10 new different dynamic graphs;
  • Presenting developed capabilities to Proonto’s CEOs. 

A large variety of Data Science subjects were revisited for the project, including clustering algorithms and NLP (Chat Keyword Extraction via TF-IDF scores, generating WordClouds and Sentiment Analysis from customer reviews on chats).

Goals were prioritized weekly according to Proonto’s needs.
Technologies used: Python (mainly pandas, scikit-learn), Docker, Amazon (AWS, QuickSight, RDS, Redshift), Tableau, MySQL.


  • Non-supervised classification of e-commerce customer stages (Discovery, Exploration, Loyalty…) throughout the Customer Journey Funnel. Unsuccessful attempts at clustering shifted our efforts towards a basic deterministic approach (setting thresholds for customer evolution). Later to be replaced by conversion probabilities.
  • Cleaning and interpreting results from Chat Keyword Extraction via TF-IDF scores. 
  • Learning and implementing visualizations on Tableau from scratch.

Achievements (according to KPIs)

  • Established the first version of a big data BI analytics dashboard.
  • Engineered over 15 new features; over 10 new dynamic data visualizations.
  • Potentially increased purchase-prediction algorithm performance by cleaning and enriching existing database.

Further development 

  • Beautify Dashboard.
  • Generate extra visualizations from the newly engineered features.
  • Refine Customer Journey classification.

Supervisor Feedback

Yuri’s performance was exemplary. For someone with no significant coding expertise prior to ITC, Yuri showed a strong ability to put complex ideas into code, work with new systems and platforms, and integrate with previously existing codebases. While Yuri is in the early stages of his career, he already possesses a strong intuition for technology and data mindfulness. He is able to understand business concepts and design and implement solutions with relatively little guidance. He is not afraid to ask questions, and that curiosity will serve him well in the future. It is also evident that ITC provided him with a practical set of skills that are relevant to today’s industry and valuable to a data science team. 

Yuri is an organized and highly motivated employee with a strong ‘get things done’ attitude. During his internship he demonstrated good personal and communication skills, he is a polite individual who takes good care of his work environment. Any company would benefit greatly from bringing Yuri on-board.

Share this post

Share on facebook
Share on twitter
Share on linkedin
Share on email