Google NLP Hack Series: West Africa ASR Challenge
Calling on the Zindi community to train ASR models
$1 000 USD
Ended 10 months ago
11 active ยท 36 enrolled
West Africa
Automatic Speech Recognition

For this model training challenge, the ASR dataset will be available as soon as the hackathon starts.

Links referenced in workshop presentation: will be linked here after the workshop

For this model training challenge, you will be using the Hausa dataset provided by Mozilla Common Voice. To access the data, visit Mozilla Common Voice Datasets and select “Common Voice Corpus 7.0” and “Language: Hausa”. Important note: the data is pre-split into train and test sets (as seen in the downloads), please only train your model using the “train.tsv” data and do not use “test.tsv” data to train your model.

Please use the Test.csv and SampleSubmission.csv on Zindi to test your model. This is the same test set on Mozilla Common Voice but the IDs have been edited to work with the Zindi system. You will need to go onto Mozilla to download Train.tsv.

About Mozilla Common Voice (

“Voice recognition technology is revolutionizing the way we interact with machines, but the currently available systems are expensive and proprietary. Mozilla Common Voice is an initiative to make voice recognition technologies better and more accessible for everyone. Common Voice is a massive global database of donated voices that lets anyone quickly and easily train voice-enabled apps in potentially every language. We're not only collecting voice samples  in widely spoken languages but also in those with a smaller population of speakers. Publishing a diverse dataset of voices will empower developers, entrepreneurs, and communities to address this gap themselves.”

Files in Mozilla’s Common Voice Hausa you can download:

  • test.tsv: portion of the data used for testing. No materials listed here can be used for training!!
  • train.tsv: training portion of the data. Any recordings listed in this file can be used for training ASR systems.
  • validated.tsv: all recordings in this file are approved by the speakers. Ok to use for training UNLESS the same utterance is listed in test.tsv.
  • invalidated.tsv: the recordings in this file received some downvotes and some upvotes by the speakers. Beware that the quality of these recordings is likely to be lower than in validated.tsv. Ok to use for training UNLESS the same utterance is listed in test.tsv.
  • other.tsv: the recordings in this file were discarded by the speakers. While this data can also be used for training (UNLESS the same utterance is listed in test.tsv), use it carefully since it might decrease the overall quality of the resulting system.
  • clips: a folder with all recordings in the mp3 format.

Every .tsv file contains the transcripts, the corresponding audio file names, and (if available), the metadata about the speakers.

Links referenced in workshop presentation:

Low Resource ASR

Mozilla Common Voice


Elpis quick guide

Preparing the corpus

  • Python script for converting txt transcription files to Elan format (.eaf)

Using Elpis