Primary competition visual

AI4D Baamtu Datamation - Automatic Speech Recognition in WOLOF

Helping Senegal
$2 000 USD
Challenge completed over 4 years ago
Classification
Automatic Speech Recognition
Natural Language Processing
365 joined
47 active
Starti
Feb 12, 21
Closei
May 23, 21
Reveali
May 23, 21
Learn from the solution of others
Notebooks · 24 May 2021, 06:12 · 6

Now that the competition is over, it would be very interesting to learn from the solution of others.

On our side (NLP Zurich, rank 6) we followed this procedure:

- pretrained XLSR model. Fine-tuning on data set

- ensemble of 3 models

- beam search with 3-gram word language model (with 20 beams for each of the 3 models)

- nearest neighbour search of the prediction in the vocabulary extracted from the training set

Discussion 6 answers

Thanks a lot. Can you share code once review is over. On my side, I fine tuned a XLSR model and got a LB score of 1.60

24 May 2021, 06:22
Upvotes 0

Single model pretrained XLSR model but preprocessed data. For convention from mp3 to wav and then removed the noises and removed the silence. used 40 epochs and dived data sets in two different datasets to process it faster. I have used almost all of the data for training. I got 10th but my original rank is 8th as my other selection was 0.078 were but somehow the algorithm didn't count it as. My future strategy was to enhance the speech using the deep learning pretrained model.

24 May 2021, 07:15
Upvotes 0
User avatar
msamwelmollel
University of Glasgow

Can you share the code, please!

24 May 2021, 08:06
Upvotes 0

sure. it will take some time to clean up the Github repo but then I'm happy to share it

User avatar
Lone_Wolf
University of ghana

Same here...

User avatar
Lone_Wolf
University of ghana

I guess moving on to further competitions we're all going to have to trust local cv more than public LB due to the massive shakeup that just happened..

My solution was based on a nvidia pretrained model .Dataset was processed into 4 variants ;

sythethic data with noise and sythetic data without noise,

non sythetic data with noise and non sythetic data without noise,

best model was achieved with sythetic data with noise..

preprocessor for encoder was a melspectogram and a greedy decoder to decode output..

cutuout was used for augementation as spec_augment didnt improve local cv..

training was in 24 epoch on 4 notebooks as i was constrained with gpu usage on kaggle under 9 hours..

final position 11th with loss of 0.09723334.... and best from submissions was 0.093545188003...

congrats to the winners and everyone who participated... I hope to see you all in another challenge soon..