Primary competition visual

Predictive Insights Youth Income Prediction Challenge

Helping South Africa
R10 000 ZAR
Challenge completed ~2 years ago
Prediction
Job Opportunity
637 joined
257 active
Starti
Jun 08, 23
Closei
Oct 01, 23
Reveali
Oct 01, 23
User avatar
Satti_Tareq
I am Having a cross validation problem
Help · 8 Aug 2023, 11:55 · 6

My cv score has no relation at all with the lb score, sometimes some increase in cv leads to lb advancement but some times it is the opposite, I am using 5 folds stratified kfold, I am also using smote and asyasin, how to solve this issue?

Discussion 6 answers

Hello! The disconnect between cross-validation (CV) scores and testing score (leaderboard score) is not uncommon.

One reason is randomness. Both CV and SMOTE, for example, have an element of randomness. This can lead to variations in model performance because you are not always using the same training sets every time. So you may sometimes get a data set that the model performs better on.

One way to circumvent this is it to use `random_state` as explained in the documentation below:

https://scikit-learn.org/stable/modules/cross_validation.html

https://imbalanced-learn.org/stable/references/generated/imblearn.over_sampling.SMOTE.html

Hope this helps.

8 Aug 2023, 12:52
Upvotes 1
User avatar
Satti_Tareq

Thank you very much!

User avatar
skaak
Ferra Solutions

Another problem is the metric auc - it can be a bit jumpy ... (or as "noisy" as it says on wikipedia here https://en.wikipedia.org/wiki/Area_under_the_curve_(receiver_operating_characteristic). It can also be trouble in your model ... here is a real story e.g. I've been playing this one very hard, and at some stage, trying to improve my model, realised I left out a few features / columns from the training set completely! So if all else fails, find those bugs, carefully check your code and logic.

Then, you know we are scored on only 20% of the test set, so on a relatively small sample? Things may change quite a bit when the final scores (on the full sample) gets revealed.

10 Aug 2023, 00:33
Upvotes 3
User avatar
Satti_Tareq

thank you very much!

User avatar
Wajdi_Hajji
ESPRIT

That is not the case with me at all. I'm doing cv=5 and it's accurate to 0.00X digital it is usually not the case in other challenges but for this one, it is very close

21 Sep 2023, 16:39
Upvotes 1
User avatar
Satti_Tareq

Thanks for your reply..I am using 5 folds also of StratifiedKFold, but I have no corelation at all with the lb, I have fixed the seed for all random processes but the problem is still there.