Primary competition visual

AI4D Baamtu Datamation - Automatic Speech Recognition in WOLOF

Helping Senegal
$2 000 USD
Challenge completed over 4 years ago
Classification
Automatic Speech Recognition
Natural Language Processing
365 joined
47 active
Starti
Feb 12, 21
Closei
May 23, 21
Reveali
May 23, 21
4th place solution
Help · 24 May 2021, 11:54 · 6

A short description of the solution process:

1. Firstly, I finetuned wav2vec2-xlsr model on a random train/validation split - this gave WER of 0.07 on the validation set and 0.15 on the (public) test set. Validation tracked train very closely, which led me to realise there are only about 700 unique train transcriptions.

2. I finetuned wav2vec2-xlsr on a train/validation split without overlapping transcriptions - this gave WER of 0.24 on validation. The best model was actually "jonatasgrosman/wav2vec2-large-xlsr-53-french" from Huggingface hub - XLSR finetuned on French Common Voice, which outperformed both French-only and multilingual models. This model scored 0.14 on the test set. For postprocessing, I applied a French spellchecker, which reduced the WER to 0.12

3. Since the test score was lower than the validation score, I suspected the test set also (partly) consisted of 700 train labels. I matched test predictions to the closest preprocessed train transcripts calculating Levenstein distance and about 85% were within 2 edits from the closest preprocessed train transcript. Submitting the preprocessed closest train transcript for all observations below 7 edits scored 0.04 on the test set. Submitting the original train transcripts (i.e. including punctuation and noise) scored 0.021

4. I retrained the model adding well-matched test samples and the validation set to train, repeated the step (3) and decoded test examples that didn't have a close match using a language model trained on the train text. This model scored 0.0202 on the test set, and was my final submission

Discussion 6 answers

Your solution made me cry. I think you did so much more than me. If I was an organizer I would have giving you MVP award.

24 May 2021, 12:18
Upvotes 0
User avatar
Muhamed_Tuo
Inveniam

I just feel the same.

Congrats.

User avatar
msamwelmollel
University of Glasgow

Please, can (I/we) have the code for this solution!

24 May 2021, 12:26
Upvotes 0
User avatar
Lone_Wolf
University of ghana

Well done Sir

24 May 2021, 12:32
Upvotes 0
User avatar
SaltigAI

Well done! Thank you for sharing.

24 May 2021, 16:50
Upvotes 0

Now with code: https://github.com/adilism/zindi-ai4d-wolof