Your Voice, Your Device, Your Language Challenge ☎️

Your Voice, Your Device, Your Language Challenge

Helping Africa

1 000 CHF

Completed (5 months ago)

Skills you will learn

Automatic Speech Recognition

Natural Language Processing

329 joined

73 active

Info Data Chat Leaderboard

Start

Jul 22, 25

Sep 22, 25

Reveal

Sep 22, 25

About

A curated set of 7 hours of audio files has been collected for the test set. You will train your model on open-source data and apply it to the test set.

Find the test set here.

To help you get started, we’ve pulled together a rich set of open-source tools and datasets that you’re free to use in this challenge.

The most important resource for your model is the Mozilla Common Voice Swahili dataset. It features over 100+ hours of labelled speech recordings contributed by native speakers. This will be your primary dataset for training and evaluation.

To strengthen your language understanding or enrich your pipeline, consider using pawa-min-alpha—a massive 2-billion parameter Swahili language model. It’s ready to plug in for language modeling, rescoring, or as a downstream component.

When it comes to tools and frameworks, you have a lot of flexibility. You can start with Whisper, a powerful open-source STT model by OpenAI. Another solid option is Vosk, which is lightweight and great for low-resource devices.

Looking to improve your model’s efficiency to run on edge device? Take a look at this practical model pruning guide. It’s especially useful if you're aiming for fast inference on limited hardware like the NVIDIA T4 GPU.

All data and tools used must be open-source and publicly licensed (CC-BY, MIT, Apache 2.0 or less restrictive). Proprietary or closed data is not allowed.

Files

Description

Files

Is an example of what your submission file should look like. The order of the rows does not matter, but the names of the "ID" must be correct.

Join the largest network for
data scientists and AI builders

About FAQs

Status