✅ Let's Talk About: What about CV and LB correlati...

Amini Soil Prediction Challenge

Helping Africa

$7 000 USD

Completed (~1 year ago)

Skills you will learn

Prediction

Earth Observation

1069 joined

336 active

Info Data Chat Leaderboard

Start

Apr 02, 25

Jun 22, 25

Reveal

Jun 23, 25

marching_learning

Nostalgic Mathematics

What about CV and LB correlation ?

17 May 2025, 19:21 · 13

Hi Zindians,

This challenge is weird, I think that It will be decided on a few outliers of the private test set.

For me there is no correlation between local CV and LB and I think it is due to the distribution of outliers. In all my experiments CV > LB, so I think there might be more outliers in the private test set.

So I'd like to know what about your experiment.

Discussion 13 answers

Knowledge_Seeker101

Freelance

Yeah that's my opinion also, there is a huge gap on CV and LB in all my experiments, it kinda feels like betting now

18 May 2025, 09:07

Upvotes 2

ill

I agree as well.

It would be ideal if variables were normalized before scoring them on LB. so that the scale of Ca, does not overshadow the scale of something e.g. Boron when computing the error metric (RMSE), I hope this is what is happening.

19 May 2025, 06:14

Upvotes 2

marching_learning

Nostalgic Mathematics

Yes they could have used MAPE or other relative loss but... Sometimes, you may wonder if the effort is worth it

replied to ill19 May 2025, 07:38

Upvotes 1

rapsoj

University of Oxford

Also calcium is only a secondary macronutrient, it is not anywhere remotely as important as nitrogen/phosphorus/potassium for plant health, yet is being weighted several orders of magnitude higher...

replied to ill23 May 2025, 15:05

Upvotes 0

DanteTheFool

Yeah same, practically no CV to LB correlation, and there are also so many data issues that it doesn't seem worth the effort to do any meaningful feature engineering.

We also have no idea how the metric is being calculated; is it RMSE accross individual targets, or their averge, or are they simply using the gap itself without reference to the nutrients or their scales.

19 May 2025, 10:07

Upvotes 0

MICADEE

LAHASCOM (Freelance)

@marching_learning Very weird indeed. Despite removing outliers, my local RMSE CV scores look solid, yet I’m still getting a poor LB score of 1120. Here are my local results: Fold 1–5 RMSE: Overall OOF RMSE: 659.4103

I plan to implement more feature engineering to see if that helps, but it’s puzzling.

19 May 2025, 14:16

Upvotes 1

marching_learning

Nostalgic Mathematics

Maybe @MICADEE you're not calling the RMSE right. I had the same issue in the beginning. Instead of doing

mean_squared_error(y_true, y_pred)

Rather use:

mean_squared_error(y_true.reshape(-1), y_pred.reshape(-1)  )

I hope it will work. When I dit it I went from CV around 67x.xxx to CV aroud 125x.xxx

replied to MICADEE21 May 2025, 17:30

Upvotes 1

MICADEE

LAHASCOM (Freelance)

@marching_learning Very thoughtful.... May be ? Will definitely revisit, look at it again and get back to you on this. Thanks for the hint.👍

replied to marching_learning21 May 2025, 17:55

Upvotes 0

CodeJoe

@marching_learning, this gives me the same result.

# Evaluate model

mae = mean_absolute_error(y_val, y_pred)

mse = mean_squared_error(y_val, y_pred)

rmse = np.sqrt(mse)

print(f'MAE: {mae:.4f}, RMSE: {rmse:.4f}')

MAE: 140.7893, RMSE: 391.2219

# Convert to NumPy arrays

y_val_array = np.array(y_val).reshape(-1)

y_pred_array = np.array(y_pred).reshape(-1)

# Evaluate model

mae = mean_absolute_error(y_val_array, y_pred_array)

mse = mean_squared_error(y_val_array, y_pred_array)

rmse = np.sqrt(mse)

print(f'MAE: {mae:.4f}, RMSE: {rmse:.4f}')

MAE: 140.7893, RMSE: 391.2219

Am I making a mistake?

replied to marching_learning22 May 2025, 12:32

Upvotes 0

marching_learning

Nostalgic Mathematics

Hello @CodeJoe I'm a bit suprised, bu try this function to settle it once for all:

import numpy as np

def RMSE(y_pred, y_true):

    return np.sqrt( np.mean((y_pred-y_true)**2) )

replied to CodeJoe22 May 2025, 13:51

Upvotes 1

CodeJoe

Just did. I just trained another. Same result:

def RMSE(y_pred, y_true):

return np.sqrt( np.mean((y_pred-y_true)**2) )

RMSE(y_pred, y_val)

481.73341583738716

replied to marching_learning22 May 2025, 13:56

Upvotes 0

marching_learning

Nostalgic Mathematics

Sorry @CodeJoe that I cannot help. I'm puzzled. This function really works for me

replied to CodeJoe22 May 2025, 13:58

Upvotes 0

CodeJoe

Oh no worries. It is all part of the experiments. Let me try k folds now.

replied to marching_learning22 May 2025, 13:59

Upvotes 1

Join the largest network for
data scientists and AI builders

About FAQs

Status