The objective of this competition was to build a machine learning model to identify an agricultural keyword (which may be in English or Luganda) spoken in an audio clip. The keywords related to crops, diseases, fertilisers, herbicides and other general agricultural topics. The winning models presented here are being put to use by researchers from Makerere University, who are developing a tool to monitor radio programs for agriculture-related information. These results may help identify disease outbreaks or other agricultural challenges more quickly.
The winners were Team Kuro, Team LMrab3in, and Team zindi-giz. You can see the competition here. 🚀
Please introduce yourself.
I am Béranger GUEDOU (Logteta), part of Team KURO along with Shiro.
I come from Benin Republic but I'm living in France.
I come from an Applied Maths and Computer Science background, and worked as an R&D data scientist. I'm also really excited about the possibilities of audio and video, as I think there could be some outstanding innovations for Africa in these fields. That is why I co-founded Dialectai, a startup specialising in speech analytics on low-resourced languages, especially in Africa.
Tell us about your winning solution?
Our solution was a combination of several techniques.
In terms of features, we used Mel spectrograms with our custom version of spectrogram augmentation, as well as several augmentations like time stretching and pitch modification at signal level.
In terms of models, we blended four versions of five-fold EfficientNet-5, -6 or -7, simple or noisy versions. For the models, we tried to use different values for parameters like hop_size and window_size. This gave us a good blending that we used to train two pseudo-labelling models that we also blended.
As some categories have only few samples, we used a custom sampler to penalise the categories with a high number of samples.
What sets your model apart from the competition?
Data augmentation was the key for this competition. Heavy realistic augmentation saved our model from overfitting. Also always look for a coherence between your cross-validation score and the leaderboard to select your best models.
What is the biggest area of opportunity you see in AI in Africa?
Audio processing! In Africa there are a great many spoken dialects that are not written at all or have very little written support. Why not process audio directly rather than looking for NLP datasets for African dialects, which probably don’t exist?
Please introduce your team.
Hello fellow contestants, we are Team LMrab3in from Tunisia. We are Azer KSOURI (ASSAZZIN), Ahmed Attia (ahmedattia), Nacir Bouazizi (patata), and Mokhtar Mami (mo5mami).
Tell us about your winning solution.
Our overall approach:
What worked/didn't work for us:
1. Stack the image with its derivatives (delta order 1 and delta order 2)
2. Create a conv layer before the pretrained models (input channel = 1, output channel = 3)
3. Change the first conv layer of the pretrained model
All of these approaches worked for us.
Things there we were planning on doing but didn't find time to:
Here is our source code: Second Place Solution Source Code
Please introduce your team.
We are Team zindi-giz (GopiDurgaprasad and Saurabh502), from India.
Tell us about your winning solution.
We trained four models: Efficientnet-5, Efficientnet-6, Efficientnet-7 and DenseNet201. Each model was trained for 20 epochs with different seeds.
model_param = { 'encoder' : 'tf_efficientnet_b5_ns', 'sample_rate': 32000, 'window_size' : 1024, 'hop_size' : 320, 'mel_bins' : 64, 'fmin' : 50, 'fmax' : 14000, 'classes_num' : 193 }
optimiser: AdamW
scheduler: get_linear_schedule_with_warmup
Here is our source code: Third Place Solution Source Code