Primary competition visual

Amini Soil Prediction Challenge

Helping Africa
$7 000 USD
Completed (9 months ago)
Prediction
Earth Observation
1061 joined
339 active
Starti
Apr 02, 25
Closei
Jun 22, 25
Reveali
Jun 23, 25
User avatar
100i
Ghana Health Service
Optimal Nutrient-specific Regression
Data · 25 Jun 2025, 20:25 · 2

| Zn | 9.7168 | B | 0.6794 | dp2a9PUF

Data and Features

For this submission, I integrated Sentinel 1 and 2 data (merged on PID, lat and lon)

I engineered features using the lat and lon - embeddings, clusters, umap, pca, angular rotations. Additionally, I engineered distance-related features - haversine, manhattan and euclidean distance. I also added the calculated mean lat and lon and calculated the distance from each coordinate point as an additional feature.

I inverted the pH column to the respective ion concentration as extra feature

I took the mean and pca of the ‘bio*’ features - ['bio1', 'bio12', 'bio15', 'bio7']

Number of features : 71

CV Split

KFold with 5 splits on the targets, nothing fancy here.

Model & Training

I used random forest regressor with 100 estimators on just a single fold (fold 0)

| Cu | 3.2162 | 9SfR74jK

Data and Features

For this submission, I did not integrate any extra data

I only replicated feature engineering as above

CV Split

Same as above

Model & Training

I picked randomly initialised catboost , lgbm and xgboost regressors as base regressors to train a voting regressor on all features.

No tuning of model parameters was done.

PS: I noted since joining the challenge (in the last week) some visible columns in the dataset (such as 'x' and 'y') as seen in the starter notebook that were absent in the datasets I downloaded. So I began to wonder whether the datasets got updated ? I think a related discussion was brought up but no feedback was given.

Discussion 2 answers
User avatar
Moujoudix

Hello @100i, thanks for sharing your approach. I still have a question, how did you deal with the big number of missing values in Sentinel-2 ? (Sites=39.16%, PIDs=30.75%, Train_sites=42.49%, Train_PIDs=33.28%, Test_sites=31.32%, Test_PIDs=22.66%)

25 Jun 2025, 20:36
Upvotes 0

Thank you @100i

26 Jun 2025, 12:53
Upvotes 0