Primary competition visual

Your Voice, Your Device, Your Language Challenge

Helping Africa
1 000 CHF
Challenge completed ~1 month ago
Automatic Speech Recognition
Natural Language Processing
278 joined
73 active
Starti
Jul 22, 25
Closei
Sep 22, 25
Reveali
Sep 22, 25
About

A curated set of 7 hours of audio files has been collected for the test set. You will train your model on open-source data and apply it to the test set.

Find the test set here.

To help you get started, we’ve pulled together a rich set of open-source tools and datasets that you’re free to use in this challenge.

The most important resource for your model is the Mozilla Common Voice Swahili dataset. It features over 100+ hours of labelled speech recordings contributed by native speakers. This will be your primary dataset for training and evaluation.

To strengthen your language understanding or enrich your pipeline, consider using pawa-min-alpha—a massive 2-billion parameter Swahili language model. It’s ready to plug in for language modeling, rescoring, or as a downstream component.

When it comes to tools and frameworks, you have a lot of flexibility. You can start with Whisper, a powerful open-source STT model by OpenAI. Another solid option is Vosk, which is lightweight and great for low-resource devices.

Looking to improve your model’s efficiency to run on edge device? Take a look at this practical model pruning guide. It’s especially useful if you're aiming for fast inference on limited hardware like the NVIDIA T4 GPU.

All data and tools used must be open-source and publicly licensed (CC-BY, MIT, Apache 2.0 or less restrictive). Proprietary or closed data is not allowed.

Files
Description
Files
Is an example of what your submission file should look like. The order of the rows does not matter, but the names of the "ID" must be correct.