Meet the Winners of the Swahili Audio Classification Hackathon
Meet the winners · 6 Jan 2023, 08:37 · 4 mins read ·
3

Meet Daniel Ofula (danmaestro), Olufemi Victor (Professor), and Oluwadunsin Fajemila (Dunsin), winners of the #ZindiWeekendz Swahili Audio Classification Hackathon. The challenge attracted 74 participants, all vying for a $300 prize pool. The objective of this challenge was to classify Swahili audio into words.

Oluwadunsin Fajemila, from Nigeria

Please introduce yourself.

My Name is Oluwadunsin Fajemila (Dunsin) from Nigeria. I am an Electronic and Electrical Engineering student of Obafemi Awolowo University; however, I specialize in NLP engineering.

Tell us a bit about your solution, and the approach you took.

At first, I planned to extract the mel spectograms, then experiment with imagenet pretrained models. However, the moment I checked some of the audio samples, I realised a wave2vec pretrained model would be able to encode the audio features better than mel spectrograms.

I then built a classifier head to classify into the different classes. I tried different parameter tuning for learning rate, as for the batch_size. Nothing much could be done due to limited GPU, which made me increase the gradient accumulation steps. My performance might have been better with ensembling more than one model, since I only used a single model.

What set your winning solution apart from others?

The model I used I believe, placing a classifier head on top of the pretrained wav2vec extracted features. Also due to limited GPU, I increased the Gradient Accumulation steps

How do you prepare for a challenge?

  • I ensure good understanding of the problem first
  • I check other challenges to know how a similar problem had been solved
  • Building a cross validation strategy is key

Words of encouragement for others, or advice that has helped you?

I would say not overwhelming myself with doubt helped me see that a problem can be solved. Most importantly, ensure that gaining knowledge and skills is the major reason for participating.

What do you like about Zindi?

The community, interaction and ease of use.

Olufemi Victor, Nigeria

Please introduce yourself.

I am Olufemi Victor (Professor), a student at the Obafemi Awolowo University in Nigeria, where I study Electronic and Electrical Engineering at the undergraduate level, and a Zindi Ambassador for Nigeria.

After two years pioneering data science operations at Farmz2u, an agritech startup where I helped build data-centric products for farmers in sub-Saharan Africa, I am now focused on building climate-based solutions with Chemotronix.

I spend most of my time in spaces around climate change, agriculture, and community building.

Tell us a bit about your solution, and the approach you took.

My approach was simple, I started off using CNNs by converting the audio files to spectrograms. To get my winning score I had to ensemble with an automatic speech recognition (ASR) model leveraging Hugging Face Transformers. Some of the things I did include:

  1. Relabelled a single audio file in the train set which was wrong.
  2. Did good augmentations when converting to spectrograms, for me removing silence worked well.
  3. Used FastAI's FastAudio approach to convert and predict as spectrograms.
  4. Ensembled with the Hugging Face ASR.

What set your winning solution apart from others?

Ensembling diverse approaches worked better than ensembling the same model.

How do you prepare for a challenge?

I learnt from other past winning solutions prior to this challenge.

Words of encouragement for others, or advice that has helped you?

Be open minded, try multiple approaches, and see the challenge as an opportunity to explore.

What do you like about Zindi?

Zindi has an amazing community.

Daniel Ofula, Kenya

Please introduce yourself.

My name is Daniel Ofula (danmaestro). I’m an ML engineer and data scientist based in Nairobi, Kenya.

Tell us a bit about your solution, and the approach you took.

My solution basically involved conversion of the audio data to mel spectograms and modelling.

Once the conversion was done, I treated the task like an image classification and proceeded as such. For the models I settled on the resnext and efficientnet families respectively.

What set your winning solution apart from others?

I believe understanding how audio works on a deeper level really made a difference for me.

How do you prepare for a challenge?

  1. Understand the problem I am trying to solve.
  2. Decide on the appropriate framework to use.
  3. Create and fine-tune models.

Words of encouragement for others, or advice that has helped you?

Pray and take time to learn more about the problem.

Back to top
If you enjoyed this content upvote this article to show your support
Discussion 0 answers