I don't feel good sharing code in these short contests since time is so scarce. But after playing around for a bit with this one I thought I'd share a few tips for those looking to improve their submission over the final day, and I'm hoping some of you will add your own input to this :)
1) Take a closer look at TotalContractValue and think about what that means. In many cases (not all!) sum(PaymentsHistory)+m1+...+m6 ~= TotalContractValue. As an example, you might use this to scale the predictions for rows where you think this relationship will hold, although how you decide those rows and how you do the scaling will determine whether this will actually be an improvement. A better option might be to work this into some features so the model can decide.
2) Are you learning each month separately? Training one model with multiple outputs? Encoding month as a feature? I'm curious which is working best (**Post your approach in the comments please!**), but switching this up might help or could at the very least add some variety to an ensemble :)
3) As always, remember the public LB is just one subsample of the data. If you have good local CV you can sometimes get a more reliable indicator for whether a specific change actually improves your model. I'm guilty of just using the submissions as my test when I don't feel like doing it properly locally, but this has burned many a competitor in the past...
That's all I have really. Besides looking at TCV I haven't had any bright ideas around features apart from some basic date-related features and basic stats of the transaction history. Anything else giving a good boost? Has anyone had any luck with deep-learning-based approaches (mine have failed with this dataset)? Any other tips?
Good luck all :)
My battery is dying (no power here) so I will try to keep an eye out for questions on my phone but please excuse me if I don't reply. I have 8% so just enough juice for one more submission :)
good men are few and far between \(*^*)/
So glad to see you back as a competitor in Zindi. You are doing really well, and by your current score I think you don't any tips from anyone of us :). Regarding local score and leaderboard trust issues, my cv is around 824. Would love to know yours and what is everyone else's cv and lb currently
Thanks :) Hehe my local CV is TBD - haven't had much time today so ignored my own advice and used almost all my submissions in place of local CV so I could focus on getting things going! If I have time tomorrow I'll try to do a better job. Interesting if there is a disconnect - is your MAE more like the LB score?
I am curious why would you want to compare MAE with RMSE, given leaderboard is RMSE :) ? One thing for sure my CV and LB are directional or in other words, improvement in CV = improvement in LB, which is a good sign
True... if there might be shake up; just few will be, Every improvement of my CV increases the LB
Good one's John.
I think there are various to this problem in as much my CV around 840 and I see most competitors CV around 800+.
it is 20% score we can all see, am expecting a big shake though
Hehe Raheem, I don't think there will be a shakeup :). I might be wrong though
yes but am sure most score will drop to 800+ in the final LB
very high chances for this to happen though :)