Apply

Please fill out in English

First Name*

Last Name*

Email*

Choose Program*

Academic experience in:(Which of these: Probability & Statistics, Calculus, Linear Algebra or none)

Mobile (Type your number without dashes)*

Country of residence*

utm_campaign

I agree to receive information from Israel Tech ChallengeI agree to receive information from Israel Tech Challenge

First Name

Last Name

utm_campaign

Choose Program*

Preferred Specialization

Mobile (Type your number without dashes)*

Linkedin Address (URL)*

Country of origin*

Country of residence*

Academic Institution*

Academic Degree

Do you have programming knowledge?

How did you hear of US?*

utm_campaign

Last Layer Quality, Verbit

February 28, 2021
, 7:41 pm
, Fellows 2020

Andrey Bakhmat

Data Science Fellows June 2020 Cohort

Abstract

The project was about creating a mechanism for quality estimation of Verbit transcriptions. This mechanism should not rely on Verbit’s QC team, as this approach does not scale. It is possible to use information from all stages of transcription preparation, starting from ASR (Automatic Speech Recognition). After ASR the transcription job is gradually improving by humans: editors and reviewers. The structure of the transcription job is complex and includes layers and splits.

Challenges (at least two)

1. The structure of storing transcription job information in the Verbit database is complex, and it took me about a week to build an interface to the relevant information in a proper and efficient way.

Unlike most of the training examples in machine learning courses, this time I was responsible for both the dataset and the model. There were (too) many variables that I could take into account or ignore.

Even before assembling the dataset, it was required to have preliminary research on how the metrics (Perplexity and Word Error Rate) behave in the process of editing transcription jobs.

Achievements (according to KPIs)

Based on preliminary analysis of WER and Perplexity on a small dataset, it became clear in which direction the work can be developed further.

Some elements of the code (especially the object-oriented approach describing the Job and Revision classes) can be used for further research.

Future project development

Use more accurate Perplexity: use another standard language model; use another language model, taking into account the topic of the transcription job.
WER: assemble a new dataset with all WER values from all revisions. Use the new metric “decomposed WER”.
Use correlation between ASR confidence values and user edits. A text alignment algorithm should be implemented for that.
Consider approaches of other teams working with similar problems, e.g. TranscRater – a tool developed in the University of Trento, Italy

Please fill out in English

Last Layer Quality, Verbit

Andrey Bakhmat

Data Science Fellows June 2020 Cohort

Share this post

See more projects

Predicting and Alerting Maternal Emotional States during Pregnancy, Nuvo Cares

Feature engineering for the current Out of stock detection ML model, Trax Retail (Retail Watch team)

Points of Consumption Like You (PLU), WeissBeerger

Dataset2Vec, Explorium