🎙️ Let's Talk About: 2nd place summary

AI4D Malawi News Classification Challenge

Helping Malawi

$2 000 USD

Completed (~5 years ago)

Skills you will learn

Classification

833 joined

322 active

Info Data Chat Leaderboard

Start

Jan 22, 21

May 09, 21

Reveal

May 09, 21

vecxoz

2nd place summary

Data · 11 May 2021, 09:43 · 15

Many thanks to all hosts and Zindi team for such an interesting challenge. Congrats and thanks to all participants!

Despite the simple formulation the task is really difficult because some classes have close semantics and small number of training examples. My final solution is an ensemble of 6 MT5 (L, XL) models trained on different sequence lengths from 64 to 256 tokens. Each of 6 models is 5-fold self-ensemble. I wasn’t aware of any other models pretrained on Chichewa language so from the beginning concentrated on MT5. My cross-validation setup is based on 5-fold stratified split. CV score of my best ensemble was 0.7005 (pretty close to the private LB 0.7097) but its public score was only 0.6419 so choosing final submissions is this competition was a bit tricky.

One interesting conclusion from my experiments is that good model can be trained on relatively short sequence (even though almost all texts are quite long). In particular one of my best single models was trained on 64 tokens. Models trained on 384 and 512 tokens were not better. Also I trained some models on different ranges of tokens like [0:256), [256:512), etc. All ranges except the 1st one gave much lower CV score.

Discussion 15 answers

_MUFASA_

Awesome !

thanks for sharing with the community 😉

Upvotes 0

great summary

Upvotes 0

Hi,

Many thanks fort sharing with us, and congratulations!

Can you also please also share your configuration, GPU, etc. @vecxoz

11 May 2021, 10:13

Upvotes 0

vecxoz

Thanks!

In general I prefer to run experiments on the free TPUs at Kaggle. If I need more resources I rent GPUs at Google Cloud. In particular in the end of this competition I rented couple hours of A100-40GB to train MT5-XL because this model does not fit in 16 GB GPU/TPU.

replied to Kamelstats13 May 2021, 13:56

Upvotes 0

Kamelstats

Thanks fort the information

replied to vecxoz13 May 2021, 14:10

Upvotes 0

MICADEE

LAHASCOM (Freelance)

Wow..... Great. Thanks for sharing. 👍

11 May 2021, 10:39

Upvotes 0

Natural

Hi!

Thank you for sharing your experiments on that challenge. I really appreciate it. 👏👏👏

11 May 2021, 11:03

Upvotes 0

Muhamed_Tuo

Inveniam

👏👏

11 May 2021, 11:23

Upvotes 0

anamip

Congratulations and thanks for sharing!

11 May 2021, 12:11

Upvotes 0

Sir-G

Contratulations!

Our result is actually is also the blend of MT5 models and the linear model with the standard TF-IDF stuff.

We used first 700 tokens.

> In particular one of my best single models was trained on 64 tokens.

Did you use the first 64 tokens in that case?

> task is really difficult because some classes have close semantics

Yeah, and also I think there are some mislabellings. I've found ~20 of them and tried to fix them.

11 May 2021, 20:58

Upvotes 0

vecxoz

Thanks!

Yes, 64 first tokens.

replied to Sir-G13 May 2021, 14:05

Upvotes 0

Kamelstats

I have also tried mT5 small model with 60 epochs and 0.001 learning rate but unfortunately it couldn't make any prediction.

@Sir-G @vecxoz can you share your hyerparameters configuration please, maybe my resources weren't

11 May 2021, 21:10

Upvotes 0

Sir-G

I had problems when all layers were trainable. Maybe try to freeze all layers except 2 or 3 last blocks.

replied to Kamelstats13 May 2021, 20:00

Upvotes 0

Kamelstats

Oh great idea!

replied to Sir-G13 May 2021, 20:04

Upvotes 0

flamethrower

Thanks a lot for sharing. I tried MT5 on Colab but ran into GPU issues since I didn't have the needed GPU at the time. Able to attain 0.58 with MT5 small but I had to use tiny batch sizes which isn't very ideal cuz it causes unstable learning shifts.

Truly, the beauty in NLP is in understanding how to leverage the knowledge of pre-trained models.

13 May 2021, 11:46 (edited ~7 hours later)

Upvotes 1

Join the largest network for
data scientists and AI builders

About FAQs

Status