🌱 Let's Talk About: Unofficial Third place

GEOAI Challenge for Cropland Mapping in Dry Environments by ITU

Helping Uzbekistan, Russian Federation

1 000 CHF

Completed (10 months ago)

Skills you will learn

Classification

Earth Observation

621 joined

184 active

Info Data Chat Leaderboard

Start

Jul 02, 25

Sep 29, 25

Reveal

Sep 29, 25

Gozie

Freelance

Unofficial Third place

1 Oct 2025, 17:30 · 10

Apologies for the earlier mix-up

My roommate logged into Zindi on my computer and didn't know he hadn't logged out before making the post.

Models: Ensemble of three models: Catboost, LightGBM, and XGBoost. Aggregated the probabilities of each model (simple averaging)

Datasets used (4-year span)

Sentinel 1 and 2
Climate data (temperature, evapotranspiration, land surface temperature, precipitation, etc.
Slope and elevation from digital elevation models

Data Preprocessing

Downloaded Sentinel 1 and 2 data for locations in the shapefiles. I had noticed that the distance between locations in the train and test data was very far apart.

Locations in Orenburg didn't have Sentinel-1 data from January 2023 to date, so I had to download 4 years of data (Jan 2018 to Dec 2022).

Feature Engineering

I created summary statistics of these datasets for each location ID. Features included.

Aggregated Sentinel 1 and 2 data for test data into monthly data
Generated vegetative, bare soil and soil moisture indices from Sentinel 2
Polarisation ratios for the polarisation channels (VV and VH) in Sentinel 1.
Water stress index from evapotranspiration datasets
Next was to aggregate these features for each ID

They include

Annual min, mean, max and standard deviation
Because these datasets are cyclical in nature, I created harmonic terms: amplitude and phase, to represent the magnitude/strength of their annual seasonality (variation), and the time when they reach their maximum
Rate of change per month (speed at which they change per month) and their acceleration (rate at which they change per month). I also included percentage changes
Rolling statistics: Rolling mean, sum and deviation every 6 months.
Location-based information: Distance to the nearest site, the average, standard deviation, and number of sites within 10km radius
Grid-based aggregation: I clustered locations in each region (separately) into four groups and assigned them to grids of 50km length. Summary statistics (min, max, mean, standard dev) of S1 &2, and climate features for ID's location in each grid.

Top features included: slope of the site, distance to closest site, annual summary (min, mean, max, std), peak time, amplitude and rate of change of VV, VH, bare soil (BSI) and moisture stress indices (MSI). These properties mostly relate to semi-arid regions.

Public LB: 0.844xx, Private LB: 0.85xxx

Discussion 10 answers

Koleshjr

Multimedia university of kenya

Amazing, and what was the cv, ? I really struggled in this since the sentinel 2 data i was downloading for train had a different distribution to test , leading me to have a very good cv >0.85 but very poor lb

1 Oct 2025, 17:40

Upvotes 0

Gozie

Freelance

For the three models, I got CVs around 0.87+ on average

replied to Koleshjr1 Oct 2025, 17:57

Upvotes 1

Koleshjr

Multimedia university of kenya

Nicee

replied to Gozie1 Oct 2025, 17:58

Upvotes 1

Gozie

Freelance

Thanks!

Well, I can't say anything about the dissimilar distributions in the train and test data. I didn't check for that.

For the sentinel data, I added a 100m buffer to include sentinel data within a 100m radius, just in case there was no data for the site's coordinates.

replied to Koleshjr1 Oct 2025, 18:11

Upvotes 0

CodeJoe

Same here @Koleshjr, I was gambling😂.

replied to Koleshjr1 Oct 2025, 18:31

Upvotes 0

CodeJoe

Amazing, Thanks for the writeup @Gozie and congratulations Big man!

replied to Gozie1 Oct 2025, 18:32

Upvotes 0

Gozie

Freelance

Thanks @CodeJoe

replied to CodeJoe1 Oct 2025, 18:33

Upvotes 1

vikrant

100 m buffer from available multiple latitude longitudes or you chose any one last lon pair and took 100 m buffer?

replied to Gozie1 Oct 2025, 19:36

Upvotes 0

Gozie

Freelance

No, polygon of points that is within 100m radius from the given coordinate. On earth engine, you can provide a given buffer and it automatically creates a polygon of points within the specified buffer during data extraction.

replied to vikrant1 Oct 2025, 19:41

Upvotes 0

vikrant

I took 100m buffer per lat lonentry it didn't work actually, if possible please give code snippet just few lines

replied to Gozie2 Oct 2025, 12:50

Upvotes 0

Join the largest network for
data scientists and AI builders

About FAQs

Status