🚜 Must-Read: Our approach (2nd place soluti...

Helping Uganda

$7 000 USD

Completed (over 5 years ago)

Skills you will learn

Classification

Automatic Speech Recognition

Natural Language Processing

740 joined

253 active

Start

Sep 11, 20

Nov 29, 20

Reveal

Nov 29, 20

Insat

Our approach (2nd place solution)

Connect · 7 Dec 2020, 14:52 · edited ~2 hours later · 4

Hello fellow contestants, representing team LMrab3in, this is the second place solution for this competition:

Our overall approach:

Extract mel features from audio.
Train using using imagenet pretrained models with 10 stratified folds and blending fold results with gmean.
blended all the models using gmean.

reducing hop length improved hugely the result for all of us.
Using per-channel energy normalization (PCEN) seemed awful in the public lb but it gave good results in the private lb and it boosted the blend results overall.
Working with low batch size (4 -> 10) seemed to improve the results
Since we used imagenet pretrained models we had 3 approach to turn mel spectogram to an image:

1.Stack the image with its derivatives (delta order 1 and delta order 2)

2.Create a conv layer before the pretrained models (input channel = 1 output channel = 3)

3.Change the pretrained model first conv layer

All these approaches worked with us.

deeper models (renest269 resnext101) helped achieve better results.
Z-score normalization (Standardization) and min max scaling * 255 helped the models converge faster.
Training with ReduceLROnPlateau scheduler with low patience and a min_lr value helped improve the result of some models.
Use after train approach to retrain the trained models on low learning rates with CosineAnnealingLR scheduler improved the result of some models.
TTA improved the result of only few models
Stacking the models made the results worse so we dropped it.
Every member of the team used different kind of data augmentations (or none). Spec mix seemed to improve the results for some models.

Mixup had some potential to help in the blend, but since its results were a bit off and we already have too many models, we dropped it. Experimenting more with mixup would potentially improve the results.
Pseudo-labeling deserved some experiments because of the small dataset and strong single models.

Discussion 4 answers