There are 5334 audio files in train and 2286 audio files in test.
The dataset was built to ensure equal representation of men and women. Data augmentation was used to generate more audio using initial contributions. Initially released in April 2024 with 800+ recordings, the resulting dataset currently has 7000+ entries of Ewe speakers, speaking in the following contexts :
- Crossing road background noise
- Rain and thunder background noise
- Rural and forest background noise
- Firefighters alarm background noise
The dataset was built 100% by Umbaji community members with the support of Google Cloud For Startups in 2023, as well as Microsoft Entrepreneurship For Positive Impact.
The audio files are available in this Google Folder.
Additional test audio files are available in this Google Folder.