Data Science Fellows June 2020 Cohort
The project was about creating a mechanism for quality estimation of Verbit transcriptions. This mechanism should not rely on Verbit’s QC team, as this approach does not scale. It is possible to use information from all stages of transcription preparation, starting from ASR (Automatic Speech Recognition). After ASR the transcription job is gradually improving by humans: editors and reviewers. The structure of the transcription job is complex and includes layers and splits.
Challenges (at least two)
- The structure of storing transcription job information in the Verbit database is complex, and it took me about a week to build an interface to the relevant information in a proper and efficient way.
- Unlike most of the training examples in machine learning courses, this time I was responsible for both the dataset and the model. There were (too) many variables that I could take into account or ignore.
- Even before assembling the dataset, it was required to have preliminary research on how the metrics (Perplexity and Word Error Rate) behave in the process of editing transcription jobs.
Achievements (according to KPIs)
- Based on preliminary analysis of WER and Perplexity on a small dataset, it became clear in which direction the work can be developed further.
- Some elements of the code (especially the object-oriented approach describing the Job and Revision classes) can be used for further research.
Future project development
- Use more accurate Perplexity: use another standard language model; use another language model, taking into account the topic of the transcription job.
- WER: assemble a new dataset with all WER values from all revisions. Use the new metric “decomposed WER”.
- Use correlation between ASR confidence values and user edits. A text alignment algorithm should be implemented for that.
- Consider approaches of other teams working with similar problems, e.g. TranscRater – a tool developed in the University of Trento, Italy