🌾 Challenge Chat: Tips for handling the temporal...

GeoAI Aquaculture Pond Identification Challenge by FAO and ITU

1000 CHF

~1 month left

Skills you will learn

Data Analysis

Classification

Feature Engineering

GIS

995 joined

344 active

Info Data Chat Leaderboard

Start

Jun 08, 26

Enrolments close

Aug 07, 26

Aug 16, 26

Reveal

Aug 16, 26

khushimalik19

Tips for handling the temporal shift to improve LB score?

9 Jun 2026, 13:06 · 2

Hi everyone! I'm stuck around 0.953 on the leaderboard and can't seem to get past it, while many of you are at 0.98+. I'd really appreciate any general guidance (no need to share code):

The challenge is about temporal generalization (train/test from different periods). What kind of features worked best for you to stay robust across seasons — raw monthly bands, time-aggregated stats, spectral indices (NDWI/MNDWI), or something else?
Did you do any train/test distribution alignment or normalization to handle the time shift?
Since threshold tuning is forbidden (fixed 0.5) and F1 is 60% of the score, did you do anything special for calibration / class weighting to optimize F1 at 0.5?
Was a single model enough, or did ensembling make the difference?

Any hint on what gives the biggest jump would mean a lot. Thanks!

Discussion 2 answers

chiwai

A few tips, maybe, from what I have so far realised.

I have had to focus first on validation. A random split can look very strong but fail badly here because the test set is from a different period/distribution. A grouped or spatial/temporal-style validation gave me a much better signal than random CV.

Feature-wise, the biggest value came from robust summaries rather than relying only on raw monthly bands. Raw Sentinel-1/2 monthly values help, but I found temporal aggregates, seasonal min/max/range/std, and water/vegetation indices like NDWI, MNDWI, NDVI-style features more stable. SAR ratios and simple texture/moisture proxies can also help. Though I have been careful with the coordinates or nearest-neighbour style features because they may overfit the public split.

For the 0.5 threshold issue, I have thought in terms of calibration during training rather than post-hoc threshold tuning. Class weighting, balanced objectives, probability calibration, and checking whether the positives count made a big difference for F1 at the fixed cutoff.

A single strong tree model for me did well, but an ensemble of different tree-based models usually gave more stable rankings. The biggest jump for me was better validation first, robust temporal/index features second, calibration/class balance third.

I hope this helps -;)

9 Jun 2026, 15:35

Upvotes 5

khushimalik19

THANKS

replied to chiwai9 Jun 2026, 15:37

Upvotes 1

Join the largest network for
data scientists and AI builders

About FAQs

Status