Primary competition visual

AI4D Baamtu Datamation - Automatic Speech Recognition in WOLOF

Helping Senegal
$2 000 USD
Challenge completed over 4 years ago
Classification
Automatic Speech Recognition
Natural Language Processing
365 joined
47 active
Starti
Feb 12, 21
Closei
May 23, 21
Reveali
May 23, 21
User avatar
Lone_Wolf
University of ghana
Transcriptions
Help · 11 May 2021, 23:22 · edited ~9 hours later · 2

Hi everyone, I'm still unclear about the language we're transcribing because in the description, it stated that model would be used to help illeterate people (who cant speak or write french) get access to transport services. and I have come to understand through researching online that the official orthograpy for wolof is latin which is related to french. but my main question is, Does the transcription column have all possible list of characters that can or could appear in the decoded text? , if not, are we allowed to use external data for training from scratch?

PS: I am new to ASR so, my thinking may be flawed and as at the time of this post my train.csv didnt include some rows

thank you in advance..

Discussion 2 answers
User avatar
African institute for mathematical sciences

The task is to do a speech-to-text. Check out this tutorial: https://www.kaggle.com/kingabzpro/wave2vec-asr-wolof

12 May 2021, 17:34
Upvotes 0

Yes as already noted, the challenge is about speech to text. Until recently, people would most likely have used a seq2seq model (for example a bi-directional LSTM). However, around 2 years ago Facebook did quite some work in this domein resulting in their wav2vec2.0 model which is today state of the art.

For your questions if all characters are represented in the training set: not necessarily. Usually you assume the same distribution of the training and test set but this does not striclty imply that you would observer the whole vocabulary.

For using external data: you would need to check the competition rules. However, the models I have mentioned above are usually not trained from scratch as the resource requirements are too high.

12 May 2021, 20:12
Upvotes 0