The dataset consists of approximately 28,000 audio clips, each lasting around 30 seconds. Each audio clip is paired with a manually verified transcription. The dataset was generated from an archive donated by the BBC Caribbean Service to The UWI after it ceased broadcasting.
Columns in train_transcripts.csv:
- clip_id: Unique identifier for each audio clip.
- file_name: Path to the .wav file.
- transcript: Text transcription of the clip.