Primary competition visual

Amini Soil Prediction Challenge

Helping Africa
$7 000 USD
Completed (9 months ago)
Prediction
Earth Observation
1061 joined
339 active
Starti
Apr 02, 25
Closei
Jun 22, 25
Reveali
Jun 23, 25
User avatar
marching_learning
Nostalgic Mathematics
What about CV and LB correlation ?
17 May 2025, 19:21 · 14

Hi Zindians,

This challenge is weird, I think that It will be decided on a few outliers of the private test set.

For me there is no correlation between local CV and LB and I think it is due to the distribution of outliers. In all my experiments CV > LB, so I think there might be more outliers in the private test set.

So I'd like to know what about your experiment.

Discussion 14 answers
User avatar
Knowledge_Seeker101
Freelance

Yeah that's my opinion also, there is a huge gap on CV and LB in all my experiments, it kinda feels like betting now

18 May 2025, 09:07
Upvotes 2
User avatar
ill

I agree as well.

It would be ideal if variables were normalized before scoring them on LB. so that the scale of Ca, does not overshadow the scale of something e.g. Boron when computing the error metric (RMSE), I hope this is what is happening.

19 May 2025, 06:14
Upvotes 2
User avatar
marching_learning
Nostalgic Mathematics

Yes they could have used MAPE or other relative loss but... Sometimes, you may wonder if the effort is worth it

User avatar
rapsoj
University of Oxford

Also calcium is only a secondary macronutrient, it is not anywhere remotely as important as nitrogen/phosphorus/potassium for plant health, yet is being weighted several orders of magnitude higher...

Yeah same, practically no CV to LB correlation, and there are also so many data issues that it doesn't seem worth the effort to do any meaningful feature engineering.

We also have no idea how the metric is being calculated; is it RMSE accross individual targets, or their averge, or are they simply using the gap itself without reference to the nutrients or their scales.

19 May 2025, 10:07
Upvotes 0
User avatar
MICADEE
LAHASCOM

@marching_learning Very weird indeed. Despite removing outliers, my local RMSE CV scores look solid, yet I’m still getting a poor LB score of 1120. Here are my local results: Fold 1–5 RMSE: Overall OOF RMSE: 659.4103

I plan to implement more feature engineering to see if that helps, but it’s puzzling.

19 May 2025, 14:16
Upvotes 1
User avatar
marching_learning
Nostalgic Mathematics

Maybe @MICADEE you're not calling the RMSE right. I had the same issue in the beginning. Instead of doing

mean_squared_error(y_true, y_pred)

Rather use:

mean_squared_error(y_true.reshape(-1), y_pred.reshape(-1)  ) 

I hope it will work. When I dit it I went from CV around 67x.xxx to CV aroud 125x.xxx

User avatar
MICADEE
LAHASCOM

@marching_learning Very thoughtful.... May be ? Will definitely revisit, look at it again and get back to you on this. Thanks for the hint.👍

User avatar
CodeJoe

@marching_learning, this gives me the same result.

# Evaluate model
mae = mean_absolute_error(y_val, y_pred)
mse = mean_squared_error(y_val, y_pred)
rmse = np.sqrt(mse)
print(f'MAE: {mae:.4f}, RMSE: {rmse:.4f}')

MAE: 140.7893, RMSE: 391.2219

# Convert to NumPy arrays

y_val_array = np.array(y_val).reshape(-1)
y_pred_array = np.array(y_pred).reshape(-1)

# Evaluate model
mae = mean_absolute_error(y_val_array, y_pred_array)
mse = mean_squared_error(y_val_array, y_pred_array)
rmse = np.sqrt(mse)

print(f'MAE: {mae:.4f}, RMSE: {rmse:.4f}')

MAE: 140.7893, RMSE: 391.2219

Am I making a mistake?

User avatar
marching_learning
Nostalgic Mathematics

Hello @CodeJoe I'm a bit suprised, bu try this function to settle it once for all:

import numpy as np

def RMSE(y_pred, y_true):
    return np.sqrt( np.mean((y_pred-y_true)**2) )
#
User avatar
CodeJoe

Just did. I just trained another. Same result:

def RMSE(y_pred, y_true):
return np.sqrt( np.mean((y_pred-y_true)**2) )
#

RMSE(y_pred, y_val)

481.73341583738716

User avatar
marching_learning
Nostalgic Mathematics

Sorry @CodeJoe that I cannot help. I'm puzzled. This function really works for me

User avatar
CodeJoe

Oh no worries. It is all part of the experiments. Let me try k folds now.

User avatar
Yisakberhanu
wachemo university

This competitions is most difficult one, hmmm

21 May 2025, 10:48
Upvotes 0