14 Jul 2021, 13:24

Meet the Winners: GIZ Agricultural Keyword Spotter Challenge Top 3 share their solutions

The objective of this competition was to build a machine learning model to identify an agricultural keyword (which may be in English or Luganda) spoken in an audio clip. The keywords related to crops, diseases, fertilisers, herbicides and other general agricultural topics. The winning models presented here are being put to use by researchers from Makerere University, who are developing a tool to monitor radio programs for agriculture-related information. These results may help identify disease outbreaks or other agricultural challenges more quickly.

The winners were Team Kuro, Team LMrab3in, and Team zindi-giz. You can see the competition here. 🚀

1st place: Team Kuro

Please introduce yourself.

I am Béranger GUEDOU (Logteta), part of Team KURO along with Shiro.

I come from Benin Republic but I'm living in France.

I come from an Applied Maths and Computer Science background, and worked as an R&D data scientist. I'm also really excited about the possibilities of audio and video, as I think there could be some outstanding innovations for Africa in these fields. That is why I co-founded Dialectai, a startup specialising in speech analytics on low-resourced languages, especially in Africa.

Tell us about your winning solution?

Our solution was a combination of several techniques.

In terms of features, we used Mel spectrograms with our custom version of spectrogram augmentation, as well as several augmentations like time stretching and pitch modification at signal level.

In terms of models, we blended four versions of five-fold EfficientNet-5, -6 or -7, simple or noisy versions. For the models, we tried to use different values for parameters like hop_size and window_size. This gave us a good blending that we used to train two pseudo-labelling models that we also blended.

As some categories have only few samples, we used a custom sampler to penalise the categories with a high number of samples.

What sets your model apart from the competition?

Data augmentation was the key for this competition. Heavy realistic augmentation saved our model from overfitting. Also always look for a coherence between your cross-validation score and the leaderboard to select your best models.

What is the biggest area of opportunity you see in AI in Africa?

Audio processing! In Africa there are a great many spoken dialects that are not written at all or have very little written support. Why not process audio directly rather than looking for NLP datasets for African dialects, which probably don’t exist?

2nd Place: Team LMrab3in

Please introduce your team.

Hello fellow contestants, we are Team LMrab3in from Tunisia. We are Azer KSOURI (ASSAZZIN), Ahmed Attia (ahmedattia), Nacir Bouazizi (patata), and Mokhtar Mami (mo5mami).

Tell us about your winning solution.

Our overall approach:

  • Extract Mel spectrogram features from audio.
  • Train a model using ImageNet pretrained models with 10 stratified folds, and blending fold results with gmean.
  • blended all the models using gmean.

What worked/didn't work for us:

  • reducing hop length hugely improved the results of our model.
  • Using per-channel energy normalization (PCEN) seemed awful in the public leaderboard, but it gave good results in the private leaderboard and it boosted the blend results overall.
  • Working with low batch size (4 - 10) seemed to improve the results.
  • Since we used ImageNet pretrained models, we had three approaches to turn Mel spectrogram into an image:
1. Stack the image with its derivatives (delta order 1 and delta order 2)
2. Create a conv layer before the pretrained models (input channel = 1, output channel = 3)
3. Change the first conv layer of the pretrained model

All of these approaches worked for us.

  • Deeper models (renest269 and resnext101) helped achieve better results.
  • Z-score normalization (standardisation) and min-max scaling * 255 helped the models converge faster.
  • Training with ReduceLROnPlateau scheduler with low patience and a min_lr value helped improve the result of some models.
  • Using an after-train approach to retrain the trained models on low learning rates with CosineAnnealingLR scheduler improved the result of some models.
  • TTA improved the result of only a few models
  • Stacking the models made the results worse so we dropped it.
  • Every member of the team used different kinds of data augmentations (or none). Spec mix seemed to improve the results for some models.

Things there we were planning on doing but didn't find time to:

  • Mixup had some potential to help in the blend, but since its results were a bit off and we already had too many models, we dropped it. Experimenting more with mixup would potentially improve the results.
  • Pseudo-labeling deserved some experiments because of the small dataset and strong single models.

Here is our source code: Second Place Solution Source Code

3rd place: Team zindi-giz

Please introduce your team.

We are Team zindi-giz (GopiDurgaprasad and Saurabh502), from India.

Tell us about your winning solution.

We trained four models: Efficientnet-5, Efficientnet-6, Efficientnet-7 and DenseNet201. Each model was trained for 20 epochs with different seeds.

model_param = { 'encoder' : 'tf_efficientnet_b5_ns', 'sample_rate': 32000, 'window_size' : 1024, 'hop_size' : 320, 'mel_bins' : 64, 'fmin' : 50, 'fmax' : 14000, 'classes_num' : 193 }

optimiser: AdamW

scheduler: get_linear_schedule_with_warmup

Here is our source code: Third Place Solution Source Code