This dataset consists of 193 agricultural keywords in English and Lugandan compiled from agricultural talk shows taken from radio recordings in Uganda. These were complemented with data scraped from Bukedde which is an online Luganda newspaper in Uganda. The keywords can be categorized into crops, diseases, fertilisers, herbicides and general keywords. There are 1109 utterances in the train set and 1017 utterances in the test set. By 31 October we will add additional .wav files.
The data consists of .wav files with unique IDs as file names. The labels for the training set are contained in train.csv, corresponding to one of the 193 agriculture-related keywords. Your task is to predict the labels for the test set, following the format in SampleSubmission.csv.
We are grateful to the researchers and winners of the GIZ AI4D Africa Language Challenge who shared the recordings which made this competition possible.
Files available for download
-
Train.csv - has the name of the agriculture-related keywords and corresponding unique .wav ID for the training files.
-
SampleSubmission.csv - is an example of what your submission file should look like. The order of the rows does not matter, but the names of the .wav must be correct. Your submission should contain probabilities that the.wav is of agriculture-related keyword category (with values between 0 and 1 inclusive).
-
Audio.zip - contains all the .wav files with unique IDs. You can use Train.csv to split the Audio data into train and test sets.
-
AdditionalUtterances.zip - contains an additional 1740 .wav files with unique IDs. These utterances can be used to supplement your training data.
-
StarterNotebook.ipynb - This is a starter notebook, it will help you make your first submission and land on the leaderboard.