I left this competition after my first and unique submission. I planned to come back about a month before the end. But I am afraid the new rules will discourage me.
While It's OK to ban english tranlation for the competition data, but banning external data like jw300 is a mistake IMHO.
This is a low-resource language. You can't have high accuracy by just using the small training data.
The only Language model pretrained on this language is mT5 but this is a huge model and not easy to set up for this classification for all. So if the goal was to limit compute power, you will just favor those who can afford more compute power and train huge model like mT5.
On the other side one could use additional data like jw300 either for masked LM pretraining or in Semi Supervised learning in the same training pipeline than the competition data and with much lightweight model. This could give light but accurate model to deploy for the sponsor.
As for me, my goal was to us a part of JW300 data in Semi supervised learning manner with the official labelled competition data to train my (relatively lightweight) model. But I think I will just give up the competition with the new rules.
@Serigne I agree with some of your points, but consider this - you used a part of the JW300 dataset. Wouldn't other competitors with bigger GPU power scrape more data (or even translate it using google API) and train bigger models?? At this stage, the best @zindi can do is to provide a slightly bigger dataset. Unless that happens - this competition is pretty biased towards people with huge GPU power who can train their own models.
@zindi when will we get the update to rules that pretrained models can now not be used?
To achieve both fairness and performance, hosts should limit resources (e.g. single GPU, training time, inference time), not methods (e.g. translation, external data).
In the Social Media Sentiment Analysis for Tunisian Arabizi competition, there is a limit on resources (training time is 8h).
I was uneasy ever since I found that there already exists a pre-trained model (mT5) that has a maximum size of 11B parameters. this can't be trained only any free GPU, rather it requires great compute resources. In this case, zindi should ban all pretrained models. external data is already not allowed forcing some people with big GPU's to use the pre-trained one and get a good amount of accuracy with little work.
@zindi can we expect that the mT5 model might be banned in the near future to preserve whatever is left of the competetion?
Hi,
I think you guys are missing the point here, by throwing mT5 under the bus. The fight should be wether or not to restore JW300.
If you look a bit at mt5-repo , you will find out that your statements about mt5 not being able to fit in a free cloud gpu, is not right. There are various architectures of mt5, going from small(300M parameters) to XXL(13B). And from experience, I can tell that mt5-small(300M) and mt5-base(600M) run like a charm on Colab GPU.
My man, I have made a new issue where I say that I don't mind mT5 in general, just that the model size (like XXL, XL Large) and all the ones that can't run in a Colab should not be allowed. The restriction was only for the large ones, I would love if people would compete to get the max accuracy with the smaller models
The issue is if you are limiting the training time, you are also limiting the method. Some methods can include multiple steps which takes time to train, not because the model is super heavy, but because you have multiple steps to do.
The inference time limit can make sense in a real world point of views for the deployment.
@Neel_Gupta you can still use the XXL model with gpu for training if you extract a submodel of it. I don't think forbid a model is a good idea because it will limits the experimentation. However, if you limit the inference time, you will clearly not be able to use the XXL model on a single gpu. So doing an inference time limit should limit the solution non-deployable which is the goal of these competition firstly.
@shiro when you limit training time, inference time doesn't matter. We have only 600 values in the test set so even if the model takes 10s for each prediction, the whole thing can be done in a little more than an hour. Plus there are always separate limits for inference time and training time.
@Neel_Gupta yes , but the goal is to use a model deployable. I don't understand why you want to limit the training time. Limiting the trianing time mean limiting the potential different methods to improve a solution. Moreover, I am not sure the large model are fitting the memory of the gpu to be honest; so limiting the inference time and the GPU for inference should be more than enough.
> Limiting the trianing time mean limiting the potential different methods to improve a solution
No, that is not the case. For example, you can browse Kaggle winners' kernel to see what level of accuracy their notebooks achieve in what training time. You can fit large models to GPU's using Model Parralelism over multiple CUDA devices and inferencing time is never a factor since training time and inference time are calculated differently by the competition organizers.