Primary competition visual

GeoAI Aquaculture Pond Identification Challenge by FAO and ITU

1000 CHF
2 months left
Data Analysis
Classification
Feature Engineering
GIS
490 joined
143 active
Starti
Jun 08, 26
Enrolments closei
Aug 07, 26
Closei
Aug 16, 26
Reveali
Aug 16, 26
User avatar
khushimalik19
Tips for handling the temporal shift to improve LB score?
9 Jun 2026, 13:06 · 2

Hi everyone! I'm stuck around 0.953 on the leaderboard and can't seem to get past it, while many of you are at 0.98+. I'd really appreciate any general guidance (no need to share code):

  1. The challenge is about temporal generalization (train/test from different periods). What kind of features worked best for you to stay robust across seasons — raw monthly bands, time-aggregated stats, spectral indices (NDWI/MNDWI), or something else?
  2. Did you do any train/test distribution alignment or normalization to handle the time shift?
  3. Since threshold tuning is forbidden (fixed 0.5) and F1 is 60% of the score, did you do anything special for calibration / class weighting to optimize F1 at 0.5?
  4. Was a single model enough, or did ensembling make the difference?

Any hint on what gives the biggest jump would mean a lot. Thanks!

Discussion 2 answers

A few tips, maybe, from what I have so far realised.

I have had to focus first on validation. A random split can look very strong but fail badly here because the test set is from a different period/distribution. A grouped or spatial/temporal-style validation gave me a much better signal than random CV.

Feature-wise, the biggest value came from robust summaries rather than relying only on raw monthly bands. Raw Sentinel-1/2 monthly values help, but I found temporal aggregates, seasonal min/max/range/std, and water/vegetation indices like NDWI, MNDWI, NDVI-style features more stable. SAR ratios and simple texture/moisture proxies can also help. Though I have been careful with the coordinates or nearest-neighbour style features because they may overfit the public split.

For the 0.5 threshold issue, I have thought in terms of calibration during training rather than post-hoc threshold tuning. Class weighting, balanced objectives, probability calibration, and checking whether the positives count made a big difference for F1 at the fixed cutoff.

A single strong tree model for me did well, but an ensemble of different tree-based models usually gave more stable rankings. The biggest jump for me was better validation first, robust temporal/index features second, calibration/class balance third.

I hope this helps -;)

9 Jun 2026, 15:35
Upvotes 4
User avatar
khushimalik19

THANKS