🚜 Challenge Chat: Question about resource restri...

Digital Green Crop Yield Estimate Challenge

Helping India

€9 400 EUR

Completed (over 2 years ago)

Skills you will learn

Prediction

1370 joined

677 active

Info Data Chat Leaderboard

Start

Sep 04, 23

Dec 03, 23

Reveal

Dec 03, 23

yanteixeira

Question about resource restriction

Help · 10 Sep 2023, 22:40 · 10

Hello!

The data section states:

Resource restriction: You may submit a maximum of 3 ensembled models.

I'm a bit confused. Does this mean that, if we choose to create an ensemble, it should consist of a maximum of 3 models, or does it mean that we can submit up to 3 separate ensembles (submissions) for the final evaluation?

Discussion 10 answers

triducnguyentang

In this case, I use cross-validation with 5 folds, generating the submission with an average of 5 folds. Will it count as 1 model or an ensemble of 5 models?

11 Sep 2023, 09:37

Upvotes 0

trongphanng

I think that it would count as 5 separate models, since they would be trained from different sets of data (even though they are not necessarily disjoint).

replied to triducnguyentang12 Sep 2023, 09:47

Upvotes 0

trongphanng

This restriction can be difficult to enforce (and interpret) and the leaderboard might have multiple top 10 solutions that rely on multiple models which would make manual checking of just top 10 solutions alone be time-consuming. Not to mention how would you classify approaches that fall into this camp?

1. Would a single random forest count as an ensemble?

2.Would a single LightGBM/XGBoost count as the same number of models as the argument value provided for n_estimators?

How would these approaches be counted?

12 Sep 2023, 09:48

Upvotes 2

yanteixeira

Totally agree. We need a more precise explanation.

replied to trongphanng12 Sep 2023, 12:19

Upvotes 0

trongphanng

@amyflorida626 Could the Zindi team gives us a definitive direction on this soon? This impacts how we would do the experiments quite a lot.

replied to trongphanng14 Sep 2023, 05:02

Upvotes 0

yanteixeira

Well, technically, it is indeed considered as 5 separate models. However, I believe that this competition considers this technique as one "single model" since it is the result of a single training process

@amyflorida626 I think it's essential for the competition host to provide a concise explanation of what is considered a single model to ensure fairness in the competition.

replied to triducnguyentang12 Sep 2023, 12:18

Upvotes 2

trongphanng

I actually don't think they would be considered a single model instance in this case, since they are instantiated and trained from scratch in each fold. If that is the case, then the wording of the rules would have been something like "maximum 3 different model *architectures*", not "maximum of 3 models" (model *instances* are implied here).

replied to yanteixeira14 Sep 2023, 05:37

Upvotes 1

trongphanng

You haven't answered all of the concerns that we've raised. Semantically, we can classify a weak learner such as a decision tree as a singular model instance even though the implementation for scikit-learn's sklearn.ensemble.RandomForestClassifier is an individual class that involves *many* trees (in a module called "ensemble" mind you).

Plus, I don't think that would solve the concern that you have, since you're still allowing a maximum of 15 model instances to contribute to submission.csv (3 models trained on 5 folds each). An alternative I think can be considered in terms of inference time of a fixed batch, size in kilobytes or some other metrics that favours low compute availability/usage.

We understand your goals and are trying to contribute to the competition as well. Hope you understand where we're coming from.

Some references in how other competitions have done this

Both of these competitions specify an "efficiency" track that checks models that simply run the fastest on a given hardware/software configuration.

https://www.kaggle.com/c/tensorflow-speech-recognition-challenge

https://www.kaggle.com/competitions/tensorflow-great-barrier-reef/overview/code-requirements

14 Sep 2023, 10:14

Upvotes 3

giuseppec

Since this is still not cleared up: I think you want to avoid stacking/ensembling a lot of diverse models as done in many kaggle competitions. So, please clarify this and state that, e.g. a random forest (which is usually an ensemble of more than 3 decision trees) counts as a single model so that its predictions can be e.g. averaged with 2 other models.

replied to trongphanng12 Oct 2023, 11:29

Upvotes 0

Rajat_Ranjan

Allstate

@zindi any update on this?

18 Nov 2023, 21:03

Upvotes 0

Join the largest network for
data scientists and AI builders

About FAQs

Status