Primary competition visual

Absa Customer Income Prediction Challenge

Helping South Africa
$5 000 USD
Completed (~3 years ago)
Prediction
341 joined
54 active
Starti
Nov 29, 22
Closei
Feb 26, 23
Reveali
Feb 26, 23
User avatar
skaak
Ferra Solutions
Congratulations
Connect · 27 Feb 2023, 09:08 · 16

Finally, ABSA income draws to a close. It was a long one this one ...

Congratulations to @E-nigma and @matrinabc - wow!

Also congrats to @DanielBruintjies - even though it seems the RMSE bit you. You set such a high standard here I often contemplated just giving up and going home ...

FWIW I'll be in touch as discussed elsewhere for a bit of a wrap discussion soon. This was - to me - so random I am really curious how others approached this.

Thanks @Zindi for hosting this and also to ABSA, hope you come back real soon with more comps. I mean, RMB is setting a high standard at the moment, you have to make some innovative plan!

Discussion 16 answers
User avatar
Pieterkii

Yeah I think what screwed us in this challenge was that there were no income group 21 or whatever in the public set. So it was really difficult to approach this with the small public customer set.

I think there were two code-21 customers in the private set, potentially a lot more for group 20. Hence, I think the private/public sets were imbalanced considering the income group codes. Which would be notable in the final RMSE since the errors will be significantly larger for higher group codes.

This was probably a mistake by Zindi. But nontheless, congratulations to the winners.

27 Feb 2023, 09:15
Upvotes 2
User avatar
skaak
Ferra Solutions

Thanks @Pieterkii

Yep, and through square in RMSE all this gets amplified. If you are lucky in public then it can really mess with your private score.

My approach to this was to use average of lots of models ... boring, nothing elegant about it ... only skill is to insert more models into pipeline.

27 Feb 2023, 09:28
Upvotes 1
User avatar
Pieterkii

@skaak I did the same. With different features per model too. I am more interested in what you did for features. I would like to show what I did too.

User avatar
skaak
Ferra Solutions

Yes - then this becomes soooo worth it.

I'll set up a zoom in about a week's time. We had really nice discussions here so I'll invite some of the others also. Not give away secrets, but discuss e.g. what features worked and how to make progress in this difficult comp. I look forward to it!

User avatar
loyisoj

I'd love to be part of this session @skaak

User avatar
wuuthraad

@skaak My Man! Top 3, well done. Excuse the hiatus on my end... I'm back.

27 Feb 2023, 09:54
Upvotes 0
User avatar
skaak
Ferra Solutions

@wuuthraad

Dragon!

So nice to hear your voice! Yes, you did take a bit of a summer slumber there, hope you rested and ready to roll!

Did you see - I was like #7 or lower, can't remember, actually stopped competing, and then had such a nice private result. I'm in the dough, but don't worry, wife spent it already ... :-(

User avatar
wuuthraad

Dude! Hopefully next time all the money is yours😂😂

I dropped drastically ... I just had a lot on my plate. Solving them one at a time.

User avatar
Terrence_SHA
Telkom(BCX)

Congratulations to the winners. @DanielBruintjies gave us a good run. Wondering what happened, possibly your model was overfitting.

27 Feb 2023, 10:05
Upvotes 0
User avatar
skaak
Ferra Solutions

Its sad in a way. I think he had one or two lucky results in public where we had big errors. Because of RMSE he had phenomenal score, but in private it sort of reversed a bit.

I have to admit, I am relieved. It was either this or @DanielBruintjies discovered some higher knowledge we all were not privvy to. I was searching for that a lot ... getting really discouraged because I think in the end it was almost random process around declared income.

User avatar
Terrence_SHA
Telkom(BCX)

Declared Income was never accurate. We tried all avenues to create features that explain and correlate with the declared but to no avail. Some clients would have a high declared income that does not correlate with their month-to-month income. What seemed to rather work was to treat them as outliers

User avatar
21db

When I started this comp was I chasing the LB, lots of feature building and selection processes. As soon as I saw a significant drop in my local cv I submitted and it always improved my LB. This was a first for me and it fueled my motivation to create meaningful features. Problems arose middle towards end of comp when I tried to improve on my CV strategy and was struggling to deal with different segments of customers (like those with less than 4 transactions in my processing), was tough to validate my approaches and decided by a flip of a coin (seemed to slightly improve public LB) that for these customers (a big 182 of them!) I would just insert the minimum value of their income group bracket... so if these customers were truthful, my model would have been great and if they weren't I would be penalized. Turns out I was severely penalized for this approach and yes @skaak RMSE as metric really did bite this approach. I do believe if I inserted more rules to deal with these weird segments my solution could have been slightly better but the small dataset made it hard to trust my validations.

Congrats for the well thought decisions to go with ensembling and such @Pieterkii and @skaak and to all others like @Terrence_SHA for your findings and robust approaches

Bigger congrats to @E-nigma for climbing the LB so fast and not falling off!! Well done!

28 Feb 2023, 01:12
Upvotes 1
User avatar
loyisoj

Thanks a lot everyone. It was definitely an interesting one. Congratulations also to you @skaak and @martinabc and the rest of the participants esp @DanielBruintjies for holding the lead for the duration of the competition.

28 Feb 2023, 19:28
Upvotes 1

This was my first competition and I just set out to try as many techniques as possible and find a clever way of getting the best out of them. I was really surprised by my results on the private leaderboard. I still have a lot to learn though :-)

This was an awesome learning experience!! Thank you Zindi and ABSA

6 Mar 2023, 07:55
Upvotes 1
User avatar
skaak
Ferra Solutions

Wow @martinabc what a performance. For a first competition!

Not sure I have this correct but it seems you are #1 at the moment! What happened to ... was it team @E-nigma? Perhaps they did not submit in time.

Anyhow, congratulations once again. You were #9 on public? This is also great and always a nice, warm feeling - if you can move up on the private LB. And (believe me, I've experienced this) quite sad if you move down when private results are published.

You mention that you tried many techniques. We'll have a chat soon I hope, but did you find any particular one that did better than the rest? Same over here, I tried many, and best was to just ensemble them all. I'll look at my models a bit, I think there were some models that did outperform, but it was not really a clear winner. Ensemble was best.

@skaak random forests work well, extra trees worked better but bagging an extra trees model ended up best.

I had a feature that by itself scored 6355.86 on the public leader board and I have found that random forest-type models tend to perform well if you have features like that (deduced from my little experience, so not sure how much weight I can put into that)