💰 Trending Now: I am Having a cross validation...

Predictive Insights Youth Income Prediction Challenge

Helping South Africa

R10 000 ZAR

Challenge completed ~2 years ago

Skills you will learn

Prediction

Job Opportunity

637 joined

257 active

Info Data Chat Leaderboard

Start

Jun 08, 23

Oct 01, 23

Reveal

Oct 01, 23

Satti_Tareq

I am Having a cross validation problem

Help · 8 Aug 2023, 11:55 · 6

My cv score has no relation at all with the lb score, sometimes some increase in cv leads to lb advancement but some times it is the opposite, I am using 5 folds stratified kfold, I am also using smote and asyasin, how to solve this issue?

Discussion 6 answers

diph

Hello! The disconnect between cross-validation (CV) scores and testing score (leaderboard score) is not uncommon.

One reason is randomness. Both CV and SMOTE, for example, have an element of randomness. This can lead to variations in model performance because you are not always using the same training sets every time. So you may sometimes get a data set that the model performs better on.

One way to circumvent this is it to use `random_state` as explained in the documentation below:

https://scikit-learn.org/stable/modules/cross_validation.html

https://imbalanced-learn.org/stable/references/generated/imblearn.over_sampling.SMOTE.html

Hope this helps.

8 Aug 2023, 12:52

Upvotes 1

Satti_Tareq

Thank you very much!

replied to diph9 Aug 2023, 09:06

Upvotes 1

skaak

Ferra Solutions

Another problem is the metric auc - it can be a bit jumpy ... (or as "noisy" as it says on wikipedia here https://en.wikipedia.org/wiki/Area_under_the_curve_(receiver_operating_characteristic). It can also be trouble in your model ... here is a real story e.g. I've been playing this one very hard, and at some stage, trying to improve my model, realised I left out a few features / columns from the training set completely! So if all else fails, find those bugs, carefully check your code and logic.

Then, you know we are scored on only 20% of the test set, so on a relatively small sample? Things may change quite a bit when the final scores (on the full sample) gets revealed.

10 Aug 2023, 00:33

Upvotes 3

Satti_Tareq

thank you very much!

replied to skaak12 Aug 2023, 16:50

Upvotes 1

Wajdi_Hajji

ESPRIT

That is not the case with me at all. I'm doing cv=5 and it's accurate to 0.00X digital it is usually not the case in other challenges but for this one, it is very close

21 Sep 2023, 16:39

Upvotes 1

Satti_Tareq

Thanks for your reply..I am using 5 folds also of StratifiedKFold, but I have no corelation at all with the lb, I have fixed the seed for all random processes but the problem is still there.

replied to Wajdi_Hajji27 Sep 2023, 05:49

Upvotes 0

Join the largest network for
data scientists and AI builders

About FAQs

Status