Primary competition visual

TechCabal Ewè Audio Translation Challenge

$1 000 USD
Challenge completed ~1 year ago
Classification
Automatic Speech Recognition
267 joined
80 active
Starti
Aug 26, 24
Closei
Sep 29, 24
Reveali
Oct 10, 24
About

There are 5334 audio files in train and 2286 audio files in test.

The dataset was built to ensure equal representation of men and women. Data augmentation was used to generate more audio using initial contributions. Initially released in April 2024 with 800+ recordings, the resulting dataset currently has 7000+ entries of Ewe speakers, speaking in the following contexts :

  • Crossing road background noise
  • Rain and thunder background noise
  • Rural and forest background noise
  • Firefighters alarm background noise

The dataset was built 100% by Umbaji community members with the support of Google Cloud For Startups in 2023, as well as Microsoft Entrepreneurship For Positive Impact.

The audio files are available in this Google Folder.

Additional test audio files are available in this Google Folder.

Files
Description
Files
Train contains the target. This is the dataset that you will use to train your model.
Updated test file that contains more testing files. Test resembles Train.csv but without the target-related columns. This is the dataset on which you will apply your model to.
Updated ss, this file contains the IDs for the additional testing files provided in Test_1. Is an example of what your submission file should look like. The order of the rows does not matter, but the names of the "ID" must be correct.