Primary competition visual

Your Voice, Your Device, Your Language Challenge

Helping Africa
1 000 CHF
Completed (5 months ago)
Automatic Speech Recognition
Natural Language Processing
331 joined
73 active
Starti
Jul 22, 25
Closei
Sep 22, 25
Reveali
Sep 22, 25
SeamlessM4T Application for Speech-to-Text Conversion
5 Mar 2026, 07:41 · 0

This document provides a simple guide to help users get started with audio-to-text conversion using the SeamlessM4T model. The content is based on the Sartify ITU test dataset on the Zindi platform, aiming for a score of approximately 0.48. The guide presents step-by-step instructions from preparing the environment and processing audio data to running the Automatic Speech Recognition (ASR) model for Swahili (swh) and creating a final output file for submission.

In this manual, readers will learn how to install necessary libraries such as fairseq2, pydub, sentencepiece, and seamless_communication. It also covers data loading, audio file preprocessing, and applying the medium version of the SeamlessM4T model along with vocoder_36langs to perform the transcription process. The guide also illustrates how to process multiple audio files in batches to increase processing efficiency, and how to perform post-processing steps to create a complete CSV file, ready for submission on the system.

Discussion 0 answers