Data Science Fellows June 2020 Cohort
The Chatbot POC started first with a data collection/analysis phase using MTurk. We gathered survey data from users which I analyzed for user trends as well as “template sentences” (common syntax between responses). Say for example users often responded to the question “where do you eat dinner?” with “i usually eat dinner at (some location) “ . With these template sentences, I generated more data by replacing the entities – in this example the location – with similar words using W2V in combination with Spacy POS tagger. Once enough data was generated, I developed a chatbot using the Rasa framework to create a complex conversation flow. The chatbot can handle both user chit-chat and information extraction.
Challenges (at least two)
- The main challenge came from the real world data – many users have spelling/grammar mistakes and sometimes it’s unclear to the human (me) what they meant. Most of the data had to be labeled by hand because of these issues.
- Because it’s the early phase of a startup there was no design in how a conversation should go or which entities should be extracted, so I was really the architect of the conversation. This is difficult without business knowledge to know if you’re on the right track. In the end the POC was just to show a complex conversation flow and will most likely need to be redone with the correct business logic.
Achievements (according to KPIs)
- Extracted and analyzed over 400 surveys from MTurk, which was filtered down to ~100 quality surveys.
- Created template sentences which allowed 100 answers to become thousands of variations.
- Successfully implemented a POC with a conversational flow similar to one that can be used in production.
Future project development
From here there’s a lot of potential to expand the POC. The first step would be to design a data schema and data warehousing solution, which should be integrated with Well-Beats existing infrastructure. From there, the bot should be designed with the schema in mind and synched with the database to be continuously writing user responses as they come in. In parallel, Well-Beat can collect much more data to have a better understanding of what users will say. The bot should be reconfigured on a cloud server for better scalability. The bot can become more complex features such as sentiment analysis and custom actions according to user preferences, which will set it apart from other chatbots.