Hi Zindians,
This challenge is weird, I think that It will be decided on a few outliers of the private test set.
For me there is no correlation between local CV and LB and I think it is due to the distribution of outliers. In all my experiments CV > LB, so I think there might be more outliers in the private test set.
So I'd like to know what about your experiment.
Yeah that's my opinion also, there is a huge gap on CV and LB in all my experiments, it kinda feels like betting now
I agree as well.
It would be ideal if variables were normalized before scoring them on LB. so that the scale of Ca, does not overshadow the scale of something e.g. Boron when computing the error metric (RMSE), I hope this is what is happening.
Yes they could have used MAPE or other relative loss but... Sometimes, you may wonder if the effort is worth it
Also calcium is only a secondary macronutrient, it is not anywhere remotely as important as nitrogen/phosphorus/potassium for plant health, yet is being weighted several orders of magnitude higher...
Yeah same, practically no CV to LB correlation, and there are also so many data issues that it doesn't seem worth the effort to do any meaningful feature engineering.
We also have no idea how the metric is being calculated; is it RMSE accross individual targets, or their averge, or are they simply using the gap itself without reference to the nutrients or their scales.
@marching_learning Very weird indeed. Despite removing outliers, my local RMSE CV scores look solid, yet I’m still getting a poor LB score of 1120. Here are my local results: Fold 1–5 RMSE: Overall OOF RMSE: 659.4103
I plan to implement more feature engineering to see if that helps, but it’s puzzling.
Maybe @MICADEE you're not calling the RMSE right. I had the same issue in the beginning. Instead of doing
Rather use:
I hope it will work. When I dit it I went from CV around 67x.xxx to CV aroud 125x.xxx
@marching_learning Very thoughtful.... May be ? Will definitely revisit, look at it again and get back to you on this. Thanks for the hint.👍
@marching_learning, this gives me the same result.
print(f'MAE: {mae:.4f}, RMSE: {rmse:.4f}')MAE: 140.7893, RMSE: 391.2219
# Convert to NumPy arrays
print(f'MAE: {mae:.4f}, RMSE: {rmse:.4f}')MAE: 140.7893, RMSE: 391.2219
Am I making a mistake?
Hello @CodeJoe I'm a bit suprised, bu try this function to settle it once for all:
Just did. I just trained another. Same result:
RMSE(y_pred, y_val)
481.73341583738716
Sorry @CodeJoe that I cannot help. I'm puzzled. This function really works for me
Oh no worries. It is all part of the experiments. Let me try k folds now.
This competitions is most difficult one, hmmm