🚗 Trending Now: 4th place solution

AI4D Baamtu Datamation - Automatic Speech Recognition in WOLOF

Helping Senegal

$2 000 USD

Challenge completed over 4 years ago

Skills you will learn

Classification

Automatic Speech Recognition

Natural Language Processing

367 joined

47 active

Info Data Chat Leaderboard

Start

Feb 12, 21

May 23, 21

Reveal

May 23, 21

adilism

4th place solution

Help · 24 May 2021, 11:54 · 6

A short description of the solution process:

1. Firstly, I finetuned wav2vec2-xlsr model on a random train/validation split - this gave WER of 0.07 on the validation set and 0.15 on the (public) test set. Validation tracked train very closely, which led me to realise there are only about 700 unique train transcriptions.

2. I finetuned wav2vec2-xlsr on a train/validation split without overlapping transcriptions - this gave WER of 0.24 on validation. The best model was actually "jonatasgrosman/wav2vec2-large-xlsr-53-french" from Huggingface hub - XLSR finetuned on French Common Voice, which outperformed both French-only and multilingual models. This model scored 0.14 on the test set. For postprocessing, I applied a French spellchecker, which reduced the WER to 0.12

3. Since the test score was lower than the validation score, I suspected the test set also (partly) consisted of 700 train labels. I matched test predictions to the closest preprocessed train transcripts calculating Levenstein distance and about 85% were within 2 edits from the closest preprocessed train transcript. Submitting the preprocessed closest train transcript for all observations below 7 edits scored 0.04 on the test set. Submitting the original train transcripts (i.e. including punctuation and noise) scored 0.021

4. I retrained the model adding well-matched test samples and the validation set to train, repeated the step (3) and decoded test examples that didn't have a close match using a language model trained on the train text. This model scored 0.0202 on the test set, and was my final submission

Discussion 6 answers