Primary competition visual

Sasol Customer Retention Recruitment Competition

Helping South Africa
R10 000 ZAR
Challenge completed ~2 years ago
Prediction
Job Opportunity
253 joined
56 active
Starti
Oct 05, 23
Closei
Nov 26, 23
Reveali
Nov 26, 23
User avatar
skaak
Ferra Solutions
Winning strategy using multiple accounts
Platform · 16 Nov 2023, 01:35 · 12

We don't get many subs for this one - just 100 whereas normally we would get 300. Also, the competition duration is relatively short and, as usual we can only select 2 best subs at the end.

Perhaps then it makes sense to create multiple accounts and enter lots of models to increase our chances. Would this be allowed @zindi ?

Of course, something like this normally fails on the private LB: the distro of this strategy is quite platykurtic whereas a good model will give you a leptokurtic private LB distro, but, if you are lucky, and enter many, then perhaps that platykurtic distro can work in your favour, which makes me want to argue that this should not be allowed.

Discussion 12 answers
User avatar
Ebiendele
Federal university of technology akure

from the rules of all zindi competitions Multiple accounts per user are not permitted, and neither is collaboration or membership across multiple individuals. Individuals and their submissions originating from multiple accounts will be immediately disqualified from the platform.

16 Nov 2023, 08:53
Upvotes 0
User avatar
skaak
Ferra Solutions

Hi @Ebiendele - congrats, nice start to this comp for you

User avatar
wuuthraad

Only question @skaak is how will @zindi know who has multiple accounts during the competition? I feel it can only be figured out when the competition ends. Personally I like this comp... it forces me to refine my solution as opposed to the 'brute force" approach I sometimes have

Also great improvement on the LB!

16 Nov 2023, 15:37
Upvotes 0
User avatar
skaak
Ferra Solutions

Thanks @wuuthraad - I'm not sure this is happening here tbh. There are a few names on the LB that looks like variations of one another and they also sub a lot and sub at about the same time ... but it could be just coincidence. But @zindi can just confirm it is cosher.

Anyhow, this comp ... baby steps ... took me 50 subs to do what you did in 20. Now I am obsessed and want to try for 0.7+. fwiw I think the data is simulated and probably 0.7 is the built in max.

My very first sub got 0.67 - I did everything right and should have stopped there. Everything since has been sus, but hey, if it moves the LB ... you know from an earlier conversation I was badly stuck at some stage. So I thought, let me just play this to learn all those peripheral settings you should always leave in the deafults ... you know, sus stuff ... but well, here we are ...

User avatar
wuuthraad

@skaak it's called hard work. the more effort you put in the greater the chance of you doing well. On my side I did not mainly focus on the modelling. FE beats complex model everytime. I have been mainly doing some EDA and FE seeing what makes a better impact on model performance and seeing what insights I can get from the data.

User avatar
skaak
Ferra Solutions

Yip - would love to compare FE on this one when done ... I looked at data again ... again I get the impression it is simulated. Either that, or the train/test split is perfection. All this effort for a mere 19 columns ...

User avatar
wuuthraad

Hahahahahaha😂😂 @skaak you crazy individual, I see you on the LB

User avatar
skaak
Ferra Solutions

Slowly getting a handle on this one, which is really really nice, but ... talk about paperthin margins ... ah my friend, this one was (is) a nice journey.

Can I also be part of the conversation after the comp is closed.

I achieved 0.56 with a basic model and 0.67 with a model built on using null values.

I keep getting worse results with model tuning vs better results when feature engineering. There is still a few features I'm struggling with handling but I would like to get 0.75+ for this comp

I agree with both of you though in the sense that FE seems to be getting me more gradual improvements over model complexity/tuning

User avatar
skaak
Ferra Solutions

Yeah, I also want 0.75+, but, tbh, any 0.7+ will be wonderful.

Sure you can be part of the discussion

If I may ask, I'm on my 3rd iteration in terms of filling in missing values strategy. My last strat was to use groupby medians

Has anyone tried using SKlearn impute methods?

User avatar
skaak
Ferra Solutions

Yes ... but let's discuss after the comp.

Note the Dealing with Nan discussion. that discusses some of these issues.