The dataset consists of approximately 28,000 audio clips, each lasting around 30 seconds. Each audio clip is paired with a manually verified transcription. The dataset was generated from an archive donated by the BBC Caribbean Service to The UWI after it ceased broadcasting. If you would like access to the data used in this challenge, please contact info@aiicentre.com.
Columns in train_transcripts.csv:
- clip_id: Unique identifier for each audio clip.
- file_name: Path to the .wav file.
- transcript: Text transcription of the clip.