Primary competition visual

AI4D Malawi News Classification Challenge

Helping Malawi
$2 000 USD
Completed (almost 5 years ago)
Classification
830 joined
322 active
Starti
Jan 22, 21
Closei
May 09, 21
Reveali
May 09, 21
2nd place summary
Data · 11 May 2021, 09:43 · 15

Many thanks to all hosts and Zindi team for such an interesting challenge. Congrats and thanks to all participants!

Despite the simple formulation the task is really difficult because some classes have close semantics and small number of training examples. My final solution is an ensemble of 6 MT5 (L, XL) models trained on different sequence lengths from 64 to 256 tokens. Each of 6 models is 5-fold self-ensemble. I wasn’t aware of any other models pretrained on Chichewa language so from the beginning concentrated on MT5. My cross-validation setup is based on 5-fold stratified split. CV score of my best ensemble was 0.7005 (pretty close to the private LB 0.7097) but its public score was only 0.6419 so choosing final submissions is this competition was a bit tricky.

One interesting conclusion from my experiments is that good model can be trained on relatively short sequence (even though almost all texts are quite long). In particular one of my best single models was trained on 64 tokens. Models trained on 384 and 512 tokens were not better. Also I trained some models on different ranges of tokens like [0:256), [256:512), etc. All ranges except the 1st one gave much lower CV score.

Discussion 15 answers
User avatar
_MUFASA_

Awesome !

thanks for sharing with the community 😉

11 May 2021, 09:46
Upvotes 0

great summary

11 May 2021, 10:00
Upvotes 0

Hi,

Many thanks fort sharing with us, and congratulations!

Can you also please also share your configuration, GPU, etc. @vecxoz

11 May 2021, 10:13
Upvotes 0

Thanks!

In general I prefer to run experiments on the free TPUs at Kaggle. If I need more resources I rent GPUs at Google Cloud. In particular in the end of this competition I rented couple hours of A100-40GB to train MT5-XL because this model does not fit in 16 GB GPU/TPU.

Thanks fort the information

User avatar
MICADEE
LAHASCOM

Wow..... Great. Thanks for sharing. 👍

11 May 2021, 10:39
Upvotes 0

Hi!

Thank you for sharing your experiments on that challenge. I really appreciate it. 👏👏👏

11 May 2021, 11:03
Upvotes 0
User avatar
Muhamed_Tuo
Inveniam

👏👏

11 May 2021, 11:23
Upvotes 0

Congratulations and thanks for sharing!

11 May 2021, 12:11
Upvotes 0

Contratulations!

Our result is actually is also the blend of MT5 models and the linear model with the standard TF-IDF stuff.

We used first 700 tokens.

> In particular one of my best single models was trained on 64 tokens.

Did you use the first 64 tokens in that case?

> task is really difficult because some classes have close semantics

Yeah, and also I think there are some mislabellings. I've found ~20 of them and tried to fix them.

11 May 2021, 20:58
Upvotes 0

Thanks!

Yes, 64 first tokens.

I have also tried mT5 small model with 60 epochs and 0.001 learning rate but unfortunately it couldn't make any prediction.

@Sir-G @vecxoz can you share your hyerparameters configuration please, maybe my resources weren't

11 May 2021, 21:10
Upvotes 0

I had problems when all layers were trainable. Maybe try to freeze all layers except 2 or 3 last blocks.

Oh great idea!

User avatar
flamethrower

Thanks a lot for sharing. I tried MT5 on Colab but ran into GPU issues since I didn't have the needed GPU at the time. Able to attain 0.58 with MT5 small but I had to use tiny batch sizes which isn't very ideal cuz it causes unstable learning shifts.

Truly, the beauty in NLP is in understanding how to leverage the knowledge of pre-trained models.