Primary competition visual

Your Voice, Your Device, Your Language Challenge

Helping Africa
1 000 CHF
Challenge completed ~1 month ago
Automatic Speech Recognition
Natural Language Processing
278 joined
73 active
Starti
Jul 22, 25
Closei
Sep 22, 25
Reveali
Sep 22, 25
User avatar
ML_Wizzard
Nasarawa State University
Starter Notebook: Achieving 0.48 Score with SeamlessM4T for Audio Transcription
4 Aug 2025, 11:23 · 6

A starter guide for audio transcription using the SeamlessM4T model, tailored to the Sartify ITU Zindi Test Dataset. It demonstrates how to load, play, and process audio files, perform ASR in Swahili ('swh'), and generate a submission file scoring around 0.48 on the Zindi leaderboard.

The notebook includes:

  • Installation of required libraries (fairseq2, pydub, sentencepiece, and seamless_communication).
  • Loading and preprocessing audio data from the dataset.
  • Utilizing the SeamlessM4T medium model with the vocoder_36langs for transcription.
  • Batch processing of audio files to optimize performance.
  • Post-processing of predictions to create a submission-ready CSV file.

https://github.com/mubrij/Your-Voice-Your-Device-Your-Language-Challenge

Discussion 6 answers
User avatar
nymfree

Thanks for sharing. Amazing model.

4 Aug 2025, 11:25
Upvotes 0
User avatar
ML_Wizzard
Nasarawa State University

Thank you @nymfree.

User avatar
Koleshjr
Multimedia university of kenya

Thanks for Sharing 🤝

4 Aug 2025, 11:57
Upvotes 0
User avatar
ML_Wizzard
Nasarawa State University

Thank you @Koleshjr.

We are still waiting for your part 3 😎.

User avatar
Koleshjr
Multimedia university of kenya

coming sooon!

User avatar
ML_Wizzard
Nasarawa State University

Allright @Koleshjr.

Can't wait to see.