Is there a way to find an equivalence between leaderboard & cv ? Leaderboard is giving 475 , CV is giving 500 000 even on smaller folds than the test's length. how is that possible? I'm using mean_squared_error and r² as metrics.
still getting 20 seconds of difference between lb/cv scores. I just wanna make sure, is it normal in case train/test are a bit different, or do i need to make a better split to have coherent cv/lb scores?
Hey DrFad, i'm using a tree model. I gave up on trying to create a perfect split to match what's the in test set. I'm trusting my cv's direction now. ( Also strangely some features disturb the difference even more )
"The error metric for this competition is the Root Mean Squared Error", so may be you should calc root from 500000?
Yeah forgot to add the sqrt lol. Thanks a lot!
still getting 20 seconds of difference between lb/cv scores. I just wanna make sure, is it normal in case train/test are a bit different, or do i need to make a better split to have coherent cv/lb scores?
It depends. Are you using a linear model or a tree model?
Hey DrFad, i'm using a tree model. I gave up on trying to create a perfect split to match what's the in test set. I'm trusting my cv's direction now. ( Also strangely some features disturb the difference even more )
Wise choice in trusting your cv score.
Followed the advice of the more experimented data scientists ;)