✅ Join the Buzz: Optimal Nutrient-specific Regr...

Amini Soil Prediction Challenge

Helping Africa

$7 000 USD

Completed (~1 year ago)

Skills you will learn

Prediction

Earth Observation

1069 joined

336 active

Info Data Chat Leaderboard

Start

Apr 02, 25

Jun 22, 25

Reveal

Jun 23, 25

100i

Ghana Health Service

Optimal Nutrient-specific Regression

Data · 25 Jun 2025, 20:25 · 2

| Zn | 9.7168 | B | 0.6794 | dp2a9PUF

Data and Features

For this submission, I integrated Sentinel 1 and 2 data (merged on PID, lat and lon)

I engineered features using the lat and lon - embeddings, clusters, umap, pca, angular rotations. Additionally, I engineered distance-related features - haversine, manhattan and euclidean distance. I also added the calculated mean lat and lon and calculated the distance from each coordinate point as an additional feature.

I inverted the pH column to the respective ion concentration as extra feature

I took the mean and pca of the ‘bio*’ features - ['bio1', 'bio12', 'bio15', 'bio7']

Number of features : 71

CV Split

KFold with 5 splits on the targets, nothing fancy here.

Model & Training

I used random forest regressor with 100 estimators on just a single fold (fold 0)

| Cu | 3.2162 | 9SfR74jK

Data and Features

For this submission, I did not integrate any extra data

I only replicated feature engineering as above

CV Split

Same as above

Model & Training

I picked randomly initialised catboost , lgbm and xgboost regressors as base regressors to train a voting regressor on all features.

No tuning of model parameters was done.

PS: I noted since joining the challenge (in the last week) some visible columns in the dataset (such as 'x' and 'y') as seen in the starter notebook that were absent in the datasets I downloaded. So I began to wonder whether the datasets got updated ? I think a related discussion was brought up but no feedback was given.

Discussion 2 answers

Moujoudix

Hello @100i, thanks for sharing your approach. I still have a question, how did you deal with the big number of missing values in Sentinel-2 ? (Sites=39.16%, PIDs=30.75%, Train_sites=42.49%, Train_PIDs=33.28%, Test_sites=31.32%, Test_PIDs=22.66%)

25 Jun 2025, 20:36

Upvotes 0

Omorinsola

Thank you @100i

26 Jun 2025, 12:53

Upvotes 0

Join the largest network for
data scientists and AI builders

About FAQs

Status