My take is that I think you need to chop some outliers at least but it did not improve as much as I hoped or nearly as much as my validation sample showed.
Actually, I think I have a bug somewhere - with hours to go!!!!!! - as my validation stats are great but my submissions keep getting wrose ....
Thanks for the advise I dealt with the outliers using PCA my score improved(unfortunately it wasn't the silver bullet). I'm having the same issue my CV scores are decent but LB score is terrible... I've been using pipelines to avoid data leakage but still the same issue. Hopefully you finish in time dude!!
Yeah - almost there. No longer using RF as it simply takes too long now.
But model is done so I am just toying with some hypers and optimising and wishing I had a faster machine and fending off a by now irrate other half.
I was hoping to find the silver bullet somewhere between outliers and also adding a lot more dummies. Same story - local score is now below 400 but on LB above 700.
Few days ago I could get 680 or so on LB with a simpler and somewhat broken model. Maybe I've lost something somewhere by fixing it up but to some extent I am satisfied. Model is done, pipeline working well and relatively bug-free. Only remaining issue is LB score!
LOL!!
I believe he saved the best model for the last day. Maybe we should just go home and await his winning solution tomorrow.
LOL!!
😄😄😆😆😆😆
@ff LOOOLLZZZ... He's from Federal Republic of Nigeria. 😃😃😃😃. What happened.? Why you ask?
Because of his score. LOL
Loooolz.... That's super amazing score.
Dude's a legend if he wins the competition
I swear! 😆
I *think* I have a vague idea how he did it. Gonna give it one last attempt today and hope I can finish it in time.
I *think*, at least a little bit of it, comes from what you see below, which is the distro of the payments.
Did it work? or are you still working on it?. I tried to structure my payments like you did above but it did not work... maybe I'm missing something.
Still busy
My take is that I think you need to chop some outliers at least but it did not improve as much as I hoped or nearly as much as my validation sample showed.
Actually, I think I have a bug somewhere - with hours to go!!!!!! - as my validation stats are great but my submissions keep getting wrose ....
Don't worry, we will see a big surprise in the leaderboard, some competitors have discovered something!!
Thanks for the advise I dealt with the outliers using PCA my score improved(unfortunately it wasn't the silver bullet). I'm having the same issue my CV scores are decent but LB score is terrible... I've been using pipelines to avoid data leakage but still the same issue. Hopefully you finish in time dude!!
Haha maybe
Thanks!
Yeah - almost there. No longer using RF as it simply takes too long now.
But model is done so I am just toying with some hypers and optimising and wishing I had a faster machine and fending off a by now irrate other half.
I was hoping to find the silver bullet somewhere between outliers and also adding a lot more dummies. Same story - local score is now below 400 but on LB above 700.
Few days ago I could get 680 or so on LB with a simpler and somewhat broken model. Maybe I've lost something somewhere by fixing it up but to some extent I am satisfied. Model is done, pipeline working well and relatively bug-free. Only remaining issue is LB score!
You could also try simply chopping off a few outliers from the sample?
Hi @skaak,
Please how do you manage to find the 4, 5 and 6 column?
?
You mean that table?
The software draws then when I select histogram
Hmmmm - you were right, quite a few surprises. How can it be? Overfit the LB?
Okay thanks! I will try it !
Who scored 502? How comes the best score is 662 on the private leaderboard. I don't understand what happened to the 502 score.
He just did a crazy overfitting!