🚜 This Week on Zindi: [Suggestion] Update the Ensemb...

Digital Green Crop Yield Estimate Challenge

Helping India

€9 400 EUR

Completed (over 2 years ago)

Skills you will learn

Prediction

1370 joined

677 active

Info Data Chat Leaderboard

Start

Sep 04, 23

Dec 03, 23

Reveal

Dec 03, 23

yanteixeira

[Suggestion] Update the Ensemble Rule

Help · 14 Sep 2023, 20:18 · 8

Hello everyone!

Given that not everyone might be following the discussion from my previous post, and I'm concerned that the content might be overlooked without being noticed by the Zindi team, I've decided to create a dedicated post concerning the ensemble rule for this competition.

There's ambiguity in the term "ensemble". In many ML Competitions, "ensemble" typically refers to the act of combining predictions from various models to derive a final decision. However, certain algorithms, such as Random Forest and Gradient Boosting, are inherently "ensemble models" by their very design. Given this, the concerns raised by fellow competitors are quite valid.

Here's a proposed, clearer set of guidelines regarding ensemble models for this competition:

1. Definition: In the context of this competition, an "ensemble" is defined as the integration of predictions from several distinct models. This definition does not include the inherent ensemble mechanisms that are part of standard algorithms like Random Forest or Gradient Boosting Machines (LightGBM/XGBoost)

2. Restriction on Ensembles: Competitors can merge predictions from up to three unique models to create an ensemble. For instance, if one wishes to form an ensemble, predictions from models like Linear Regression, Random Forest, and a Neural Network can be combined. This combination will be recognized as a single ensemble.

3. Cross-Validation Clarification: Using cross-validation and averaging the predictions from multiple folds does NOT count as multiple models. It's considered a part of the training and evaluation process for a single model. For instance, if you train a Neural Network using 5-fold cross-validation and then average the predictions of those 5 folds to create a submission, it's still considered a single Neural Network model.

4. Inherent Ensemble Models: Algorithms like Random Forest, Gradient Boosting Machines, etc., which inherently use ensemble mechanisms, are treated as a single model regardless of the number of trees/estimators they use. For instance, a Random Forest with 100 trees is considered one model, not 100.

Of course, this is merely a suggestion. I'm eager to hear opinions from both the competitors and the Zindi team. Regardless of the eventual decision, I believe there's consensus that the text in the Data Section needs revision.

Discussion 8 answers

Tkay

this is what i think they mean when they say that each participant in the competition is allowed to submit a maximum of 3 ensembled models : The rule says each person can only combine predictions from up to 3 models at most. So, they can submit up to three combined predictions eg (XGB, Random Forest,lighGb) made from different models. lets say for example you use cross-validation with 10 folds and generating a submission by averaging the predictions from these 10 folds is considered as using a single model, not an ensemble of 10 models, is just my analysis of the rule meaning we cant use more than 3 models in our ensembles, (Computational cost i guess)

15 Sep 2023, 07:31

Upvotes 2

rbrgAlou

If you decide to use a LOOCV strategy, you will have more than 3800 models. And according to you we have to consider it as a single model if we averaging the predictions of all these models ?

replied to Tkay18 Oct 2023, 17:01

Upvotes 0

trongphanng

Thanks for consolidating that discussion into a single post. I agree on the clarifications that you've made.

Point 3 can be a discussion point for the hosts though, since this one is on the host's commitment to make the resulting predictions lightweight. If they want to encourage small footprint, then they could consider each fold's trained model count towards the limit. This change is the difference between allowing the contribution of 3 models and 15 models even with just the 5 fold suggestion the host made in the previous discussion.

15 Sep 2023, 10:33

Upvotes 2

Ok but for eg. XGB, Random Forest, lighGbm are 3 models.

That would imply that you wouldn't be allowed any sort of fusion layer to mix/stack/bag/boost the results of the 3. You'd only be allowed to do some sort of averaging.

With a regression model ensembling your models, that's already 1 model, so you'd only be allowed 2 models below it, not 3 as mentioned.

Is this correct?

20 Sep 2023, 15:40

Upvotes 0

yanteixeira

Maximum Number of Single Models Allowed: 3

so,

Single Model 1: XGB
Single Model 2: RF
Single Model 3: LGBM

Maximum Number of Ensembles Using Predictions from Above Models: 1

So you have to choose:

Averaging the predictions from these models; OR
Blending the predictions from these models; OR
Hard votting the predictions from these models; OR
Other ensemble technique.

Your submission should consist of up to 3 single models used to create ONE ensemble

Note: If you are not satisfied with the predictions of one of the models and wish to replace it, your selection might look like this:

Single Model 1: XGB
Single Model 2: RF
Single Model 3: Linear Regression

If you prefer to use only 2 models for the ensemble, your selection might be:

Single Model 1: XGB
Single Model 2: RF

Should you wish to use just 1 model, that's allowed too. In such cases, the ensemble doesn't count.

Single Model 1: XGB

replied to db20 Sep 2023, 16:00

Upvotes 0

Ok thanks, what I understand from that is, I'm allowed to have 3 models, plus a model on top that ensembles the 3.

replied to yanteixeira20 Sep 2023, 16:35

Upvotes 0

yanteixeira

In this competition context, the ensemble is a combination of the predictions of the three models into one final prediction.

replied to db20 Sep 2023, 17:12

Upvotes 1

rbrgAlou

On point 3, if you decide to use a LOOCV strategy, you will have more than 3800 models. And according to you we have to consider it as a single model if we averaging the predictions of all these models ?

18 Oct 2023, 16:58

Upvotes 0

Join the largest network for
data scientists and AI builders

About FAQs

Status