We don't get many subs for this one - just 100 whereas normally we would get 300. Also, the competition duration is relatively short and, as usual we can only select 2 best subs at the end.
Perhaps then it makes sense to create multiple accounts and enter lots of models to increase our chances. Would this be allowed @zindi ?
Of course, something like this normally fails on the private LB: the distro of this strategy is quite platykurtic whereas a good model will give you a leptokurtic private LB distro, but, if you are lucky, and enter many, then perhaps that platykurtic distro can work in your favour, which makes me want to argue that this should not be allowed.
from the rules of all zindi competitions Multiple accounts per user are not permitted, and neither is collaboration or membership across multiple individuals. Individuals and their submissions originating from multiple accounts will be immediately disqualified from the platform.
Hi @Ebiendele - congrats, nice start to this comp for you
Only question @skaak is how will @zindi know who has multiple accounts during the competition? I feel it can only be figured out when the competition ends. Personally I like this comp... it forces me to refine my solution as opposed to the 'brute force" approach I sometimes have
Also great improvement on the LB!
Thanks @wuuthraad - I'm not sure this is happening here tbh. There are a few names on the LB that looks like variations of one another and they also sub a lot and sub at about the same time ... but it could be just coincidence. But @zindi can just confirm it is cosher.
Anyhow, this comp ... baby steps ... took me 50 subs to do what you did in 20. Now I am obsessed and want to try for 0.7+. fwiw I think the data is simulated and probably 0.7 is the built in max.
My very first sub got 0.67 - I did everything right and should have stopped there. Everything since has been sus, but hey, if it moves the LB ... you know from an earlier conversation I was badly stuck at some stage. So I thought, let me just play this to learn all those peripheral settings you should always leave in the deafults ... you know, sus stuff ... but well, here we are ...
@skaak it's called hard work. the more effort you put in the greater the chance of you doing well. On my side I did not mainly focus on the modelling. FE beats complex model everytime. I have been mainly doing some EDA and FE seeing what makes a better impact on model performance and seeing what insights I can get from the data.
Yip - would love to compare FE on this one when done ... I looked at data again ... again I get the impression it is simulated. Either that, or the train/test split is perfection. All this effort for a mere 19 columns ...
Hahahahahaha😂😂 @skaak you crazy individual, I see you on the LB
Slowly getting a handle on this one, which is really really nice, but ... talk about paperthin margins ... ah my friend, this one was (is) a nice journey.
Can I also be part of the conversation after the comp is closed.
I achieved 0.56 with a basic model and 0.67 with a model built on using null values.
I keep getting worse results with model tuning vs better results when feature engineering. There is still a few features I'm struggling with handling but I would like to get 0.75+ for this comp
I agree with both of you though in the sense that FE seems to be getting me more gradual improvements over model complexity/tuning
Yeah, I also want 0.75+, but, tbh, any 0.7+ will be wonderful.
Sure you can be part of the discussion
If I may ask, I'm on my 3rd iteration in terms of filling in missing values strategy. My last strat was to use groupby medians
Has anyone tried using SKlearn impute methods?
Yes ... but let's discuss after the comp.
Note the Dealing with Nan discussion. that discusses some of these issues.