🎙️ Challenge Chat: The New rules are heavily limi...

AI4D Malawi News Classification Challenge

Helping Malawi

$2 000 USD

Completed (~5 years ago)

Skills you will learn

Classification

833 joined

322 active

Info Data Chat Leaderboard

Start

Jan 22, 21

May 09, 21

Reveal

May 09, 21

Serigne

The New rules are heavily limiting the potentiel of great solutions.

Connect · 2 Mar 2021, 17:51 · edited ~18 hours later · 10

I left this competition after my first and unique submission. I planned to come back about a month before the end. But I am afraid the new rules will discourage me.

While It's OK to ban english tranlation for the competition data, but banning external data like jw300 is a mistake IMHO.

This is a low-resource language. You can't have high accuracy by just using the small training data.

The only Language model pretrained on this language is mT5 but this is a huge model and not easy to set up for this classification for all. So if the goal was to limit compute power, you will just favor those who can afford more compute power and train huge model like mT5.

On the other side one could use additional data like jw300 either for masked LM pretraining or in Semi Supervised learning in the same training pipeline than the competition data and with much lightweight model. This could give light but accurate model to deploy for the sponsor.

As for me, my goal was to us a part of JW300 data in Semi supervised learning manner with the official labelled competition data to train my (relatively lightweight) model. But I think I will just give up the competition with the new rules.

Discussion 10 answers

Prometheus

@Serigne I agree with some of your points, but consider this - you used a part of the JW300 dataset. Wouldn't other competitors with bigger GPU power scrape more data (or even translate it using google API) and train bigger models?? At this stage, the best @zindi can do is to provide a slightly bigger dataset. Unless that happens - this competition is pretty biased towards people with huge GPU power who can train their own models.

@zindi when will we get the update to rules that pretrained models can now not be used?

2 Mar 2021, 21:30 (edited 7 minutes later)

Upvotes 0

ktr

To achieve both fairness and performance, hosts should limit resources (e.g. single GPU, training time, inference time), not methods (e.g. translation, external data).

In the Social Media Sentiment Analysis for Tunisian Arabizi competition, there is a limit on resources (training time is 8h).

3 Mar 2021, 05:09

Upvotes 0

Prometheus

I was uneasy ever since I found that there already exists a pre-trained model (mT5) that has a maximum size of 11B parameters. this can't be trained only any free GPU, rather it requires great compute resources. In this case, zindi should ban all pretrained models. external data is already not allowed forcing some people with big GPU's to use the pre-trained one and get a good amount of accuracy with little work.

@zindi can we expect that the mT5 model might be banned in the near future to preserve whatever is left of the competetion?

replied to ktr3 Mar 2021, 08:32

Upvotes 0

Muhamed_Tuo

Inveniam

Hi,

I think you guys are missing the point here, by throwing mT5 under the bus. The fight should be wether or not to restore JW300.

If you look a bit at mt5-repo , you will find out that your statements about mt5 not being able to fit in a free cloud gpu, is not right. There are various architectures of mt5, going from small(300M parameters) to XXL(13B). And from experience, I can tell that mt5-small(300M) and mt5-base(600M) run like a charm on Colab GPU.

replied to Prometheus3 Mar 2021, 09:15

Upvotes 0

Prometheus

My man, I have made a new issue where I say that I don't mind mT5 in general, just that the model size (like XXL, XL Large) and all the ones that can't run in a Colab should not be allowed. The restriction was only for the large ones, I would love if people would compete to get the max accuracy with the smaller models

replied to Muhamed_Tuo3 Mar 2021, 11:28

Upvotes 0

Shiro

The issue is if you are limiting the training time, you are also limiting the method. Some methods can include multiple steps which takes time to train, not because the model is super heavy, but because you have multiple steps to do.

The inference time limit can make sense in a real world point of views for the deployment.

replied to ktr13 Mar 2021, 01:00

Upvotes 0

Shiro

@Neel_Gupta you can still use the XXL model with gpu for training if you extract a submodel of it. I don't think forbid a model is a good idea because it will limits the experimentation. However, if you limit the inference time, you will clearly not be able to use the XXL model on a single gpu. So doing an inference time limit should limit the solution non-deployable which is the goal of these competition firstly.

replied to Prometheus13 Mar 2021, 15:16

Upvotes 0

Prometheus

@shiro when you limit training time, inference time doesn't matter. We have only 600 values in the test set so even if the model takes 10s for each prediction, the whole thing can be done in a little more than an hour. Plus there are always separate limits for inference time and training time.

replied to Shiro13 Mar 2021, 20:16

Upvotes 0

Shiro

@Neel_Gupta yes , but the goal is to use a model deployable. I don't understand why you want to limit the training time. Limiting the trianing time mean limiting the potential different methods to improve a solution. Moreover, I am not sure the large model are fitting the memory of the gpu to be honest; so limiting the inference time and the GPU for inference should be more than enough.

replied to Prometheus17 Mar 2021, 23:00

Upvotes 0

Prometheus

> Limiting the trianing time mean limiting the potential different methods to improve a solution

No, that is not the case. For example, you can browse Kaggle winners' kernel to see what level of accuracy their notebooks achieve in what training time. You can fit large models to GPU's using Model Parralelism over multiple CUDA devices and inferencing time is never a factor since training time and inference time are calculated differently by the competition organizers.

replied to Shiro17 Mar 2021, 23:47 (edited 1 minute later)

Upvotes 0

Join the largest network for
data scientists and AI builders

About FAQs

Status