Primary competition visual

Africa Biomass Challenge

Helping CĂ´te d'Ivoire
$10 000 USD
Completed (almost 3 years ago)
Earth Observation
Prediction
1223 joined
276 active
Starti
Jan 27, 23
Closei
May 21, 23
Reveali
May 21, 23
User avatar
Koleshjr
Multimedia university of kenya
Was this won by pure luck?
Platform · 22 May 2023, 06:26 · 21

Did someone have a correlated CV/Public LB/ Private LB?

Discussion 21 answers

Waiting for answers . No correlations with CV ..may be some with public LB . But still trying to figure out .

22 May 2023, 08:21
Upvotes 0
User avatar
enigmatic

I expected this since we only had a few data during submission. If you test your model on the same amount of data from the train data(90 rows), you might find the correlation you are seeking. I had them in range of 70's and 80's when I used only 90 rows for testing.

22 May 2023, 08:27
Upvotes 2

My best score is train.mean() *2.1 , 75 private score or something which can take me to 16th or 17th position ..ofcourse I didn't select it ..would have been completely pointless .

Testing with 90 records was a good idea, did you have a good correlations with CV and Private ? Any specific distribution you selected this 90 values ?

User avatar
Koleshjr
Multimedia university of kenya

Because if you chose the 90 images for testing , how did you ensure they share the same distributions as the 90 images in the submission file?

User avatar
enigmatic

Yes I did but some were overfitted. I chose them randomly, I didn't spend much time on this but it might have helped when selecting a final submission.

Not sure if you call this luck but a quick experiment based on submissions

Single lgbm seed 42 - public 52 private 88 , seed 2023 -public 63 private 97

22 May 2023, 09:02
Upvotes 1
User avatar
skaak
Ferra Solutions

Interesting ... but too few degrees of freedom, can't draw reliable inference from that observation alone :-)

I'm 85% certain this competition was pretty much won by luck. I leave a 15% chance that someone came up with a very good way of evaluating their models, but that remains to be seen :)

I noticed that predicting the mean value of 105 for every test point did wayy better than almost all models I trained. I got my best score by simply taking my model's output, and adding a constant to all my predictions such that the mean prediction is 105. (Got me a public score of around 49, without extensively probing the public set).

This got a private score of 72.7, which is among the top private scores. However, it would completely defy the purpose, since there is no actual reason to bias the model's predictions to have a mean of 105.

Given that an unreasonable model does as well as the top performers, I think the competition was pretty much decided by luck. But I am open to be convinced otherwise.

22 May 2023, 12:17
Upvotes 4
User avatar
Koleshjr
Multimedia university of kenya

Clever

User avatar
21db

Having access to some of the agbd data from ground truth seems necessary to build good model.

User avatar
skaak
Ferra Solutions

So you could be #3 by submitting 105 ... perhaps there are big outliers and top guys managed to get close to them, and 105 is good average of outliers. Perhaps top guys just got close to 105, then this is just random.

We were very careful not to overfit, our public LB was not at all what we selected, but still we slid down quite a bit. Our best private LB sub, which we also selected, has average 112 fwiw (EDIT that sub has public 48 and private 79 fwiw)

User avatar
skaak
Ferra Solutions

Based on another discussion by @MohamedDHIAB I think he just subbed constant to be #10, so he must be slightly off 105 ...

User avatar
MICADEE
LAHASCOM

@Koleshjr You can say that again, this is pure luck in many cases. In my team's case, we achieved 72.57 on Private LB against 57.14 on Public LB like two months ago and that's (3rd Place position on private LB). But this wasn't chosen because its Public LB score didn't look convincing then. So I could agree with @smartstix that it's certain that 85% of this competition was pretty much won by luck.

22 May 2023, 13:14
Upvotes 0
User avatar
Koleshjr
Multimedia university of kenya

How did you get that impressive score what approach did you use ?

User avatar
MICADEE
LAHASCOM

@Koleshjr Nothing too serious here. The major lift in my case is the computation of various vegetation Indices:(using spyndex library); followed by aggregation of these indices.

Reference: https://github.com/awesome-spectral-indices/awesome-spectral-indices/blob/main/output/spectral-indices-table.csv

Train Dataset Splitting Technique:

Target (biomass) values based splitting method:

Here, my thorough obsevation shows that it's quite hard for any model to predict the required higher count of biomass in shaded regions in Cote d’Ivoire based on GEDI, Sentinel-2 and ground truth biomass data provided. Henceforth, the generated train dataset above were splitted into two parts as follows below, where:

  • Only train target (biomass) values that are greater than 60 are selected for model training in part 1, and
  • Only train target (biomass) values that are less than and equal to 60 are selected for model training in part 2 respectively.

And this resulted to two-stage modelling parts.

Features Selection Technique: Create a wrapper class that has all the built in statistical tests required to perform feature selection and takes some basic inputs from user and spits out the required features (using scoring="f_regression").

Model Trained: CatBoostRegressor with 5kolds CV (for each of the Model training in part 1 and part 2).

Final Stage: Averaging predictions coming from the two-stage modelling parts as a single final prediction.

(Lest i forget, i also assigned weight ratio 90/10 to part 1 and part 2 of the model training predictions respectively). Note: This method gave better score than "Averaging" above.

User avatar
skaak
Ferra Solutions

Nice, thanks for sharing @MICADEE

Just to clarify ... in feature selection, you basically use f_regression after fitting y=f(x) where x = one of the indices output (or aggregate output) and this is how you select which indices to use in the end?

User avatar
MICADEE
LAHASCOM

Uwc @skaak.

Yes exactly.

Something like this below showing only part of the aforementioned:

self.n_features = n_features
if problem_type == "classification":
valid_scoring = {
"f_classif": f_classif,
"chi2": chi2,
"mutual_info_classif": mutual_info_classif
}
else:
valid_scoring = {
"f_regression": f_regression,
"mutual_info_regression": mutual_info_regression
}
if scoring not in valid_scoring:
raise Exception("Invalid scoring function")
User avatar
skaak
Ferra Solutions

Thanks again @MICADEE

This is fine for 1d. If I understand correctly, you test features one by one. Perhaps there are two, say x1 and x2, that do not work individually but together they work pretty well ie in 2d.

Not that I would do it differently tbh, but I've heard about stepwise regression, foreward selection and backward elimination to use for that. I only saw it implemented in SAS and never used it myself, but I wonder what those would select, and if it would be better, but, again tbh, I would not even bother myself. Those techniques are anyhow linear afaik and today's techniques are geared towards milking all out of the data.

fwiw I started this comp like that, calculating lots of bands (there was some nice public kernel to help with that) but after a while and no progress I switched to sort of global approach, just running all the data through a single model without really extracting features. It worked a bit better in CV and public LB, so that's what I ended with, but I wonder if those earlier models would not have done a bit better given the erratic LB.

Oh well, congrats on your performance here, I see you worked very hard.

correlated CV/Public LB

No.

It was my first competition with satellite data, so I was curious to see how to treat it. At the beginning, the results seemed very variable from one model to another (though cross validation scores were similar). It was a little bit scary.

A red flag was when I tried to average two of my models. I made a mistake and did not divide the result by 2. Surprisingly this has been my best public LB model for some time. (When I found the mistake and fixed it, I had a much worse public LB score) Another scary clue.

Then, for a period, I had some luck regarding the correlation between local CV and public LB. The figures were not the same (public LB was smaller than CV), but an improvement in CV would result in an improvement in public LB. But I was just luring myself...

At some point, I had a huge improvement in my 3 folds local CV : RMSE=~53 (reducing my best previous RMSE by 2% which was a big step compared to the previous improvements). This resulted in an awful score on the public LB (+10% RMSE !!). At that point, I just stopped working on the data.

After all, the distribution of the target (from 10 to 1000) combined with the use of the RMSE and a (very) small sample size made this competition very sensitive to outliers.

I just took two models from my experiments (the one with the best local CV and another one with a good public LB score) which performed equally badly on private LB :/

correlated Public LB/ Private LB

No, again. My best private LB (74.96) model is one of my models which I had multiplied by 1.5 for no particular reason (and therefore, did not retain)

22 May 2023, 16:37
Upvotes 3
User avatar
skaak
Ferra Solutions

Using average of two models should be robust but here it was not ...

BTW if metric was e.g. RMSE( log(actual) - log(predicted) ) do you think it would be better / less random?

Then the problem is not outliers, but the fact that variance increases as value (actual) increase and you need to transform that away, but that still won't give you better LB if LB uses RMSE metric. Then RMSE is wrong metric here ...