🚗 This Week on Zindi: Learn from the solution of oth...

AI4D Baamtu Datamation - Automatic Speech Recognition in WOLOF

Helping Senegal

$2 000 USD

Challenge completed over 4 years ago

Skills you will learn

Classification

Automatic Speech Recognition

Natural Language Processing

367 joined

47 active

Info Data Chat Leaderboard

Start

Feb 12, 21

May 23, 21

Reveal

May 23, 21

Roman18

Learn from the solution of others

Notebooks · 24 May 2021, 06:12 · 6

Now that the competition is over, it would be very interesting to learn from the solution of others.

On our side (NLP Zurich, rank 6) we followed this procedure:

- pretrained XLSR model. Fine-tuning on data set

- ensemble of 3 models

- beam search with 3-gram word language model (with 20 beams for each of the 3 models)

- nearest neighbour search of the prediction in the vocabulary extracted from the training set

Discussion 6 answers

aninda_bitm

Thanks a lot. Can you share code once review is over. On my side, I fine tuned a XLSR model and got a LB score of 1.60

24 May 2021, 06:22

Upvotes 0

kingabzpro

Single model pretrained XLSR model but preprocessed data. For convention from mp3 to wav and then removed the noises and removed the silence. used 40 epochs and dived data sets in two different datasets to process it faster. I have used almost all of the data for training. I got 10th but my original rank is 8th as my other selection was 0.078 were but somehow the algorithm didn't count it as. My future strategy was to enhance the speech using the deep learning pretrained model.

24 May 2021, 07:15

Upvotes 0

msamwelmollel

University of Glasgow

Can you share the code, please!

24 May 2021, 08:06

Upvotes 0

Roman18

sure. it will take some time to clean up the Github repo but then I'm happy to share it

replied to msamwelmollel24 May 2021, 11:28

Upvotes 0

Lone_Wolf

University of ghana

Same here...

replied to Roman1824 May 2021, 11:33

Upvotes 0

Lone_Wolf

University of ghana

I guess moving on to further competitions we're all going to have to trust local cv more than public LB due to the massive shakeup that just happened..

My solution was based on a nvidia pretrained model .Dataset was processed into 4 variants ;

sythethic data with noise and sythetic data without noise,

non sythetic data with noise and non sythetic data without noise,

best model was achieved with sythetic data with noise..

preprocessor for encoder was a melspectogram and a greedy decoder to decode output..

cutuout was used for augementation as spec_augment didnt improve local cv..

training was in 24 epoch on 4 notebooks as i was constrained with gpu usage on kaggle under 9 hours..

final position 11th with loss of 0.09723334.... and best from submissions was 0.093545188003...

congrats to the winners and everyone who participated... I hope to see you all in another challenge soon..

24 May 2021, 09:25 (edited 5 minutes later)

Upvotes 0

Join the largest network for
data scientists and AI builders

About FAQs

Status