Primary competition visual

Absa Customer Income Prediction Challenge

Helping South Africa
$5 000 USD
Completed (~3 years ago)
Prediction
341 joined
54 active
Starti
Nov 29, 22
Closei
Feb 26, 23
Reveali
Feb 26, 23
User avatar
Pieterkii
Submission format and data
Data · 13 Dec 2022, 10:50 · 10

Unless I missed something, I think the competition description and sample submission are misleading as they both suggest a submission with raw NET income; I realised from the leaderboard errors that I should submit group codes instead (unless the entire public set just happens to be incomes less than R20.)

Considering people's scores: an RMSE of 5 isn't very good since it means that the predictions have a standard deviation of 5 income groups, so I don't think anyone is submitting the format expected.

Discussion 10 answers
User avatar
skaak
Ferra Solutions

?

Look at this thread https://zindi.africa/competitions/absa-customer-income-prediction-challenge/discussions/14364

I think you need to submit net income / 1000

13 Dec 2022, 11:37
Upvotes 0
User avatar
Pieterkii

Thanks, would be nice if this was stated somewhere.

User avatar
skaak
Ferra Solutions

Agree!

This *needs* to be stated explicitly.

@zindi @amyflorida626 this probably will continue to cause confusion, can you perhaps clarify somewhere visible that we have to predict

income / 1000

in the subs.

User avatar
Amy_Bray
Zindi

Hey hey, Absa provided us with a file called "Target". The file Target had 3 columns, as seen in the Train file [CUSTOMER_IDENTIFIER, RECORD_DATE, DECLARED_NET_INCOME]. We used the provided Target file and split it 70/30 on CUSTOMER_IDENTIFIER (if I am not mistaken) into the Train and Test files you see on Zindi. The exception being we kept the last column [DECLARED_NET_INCOME] for the Test file on our backend for scoring.

So to confirm. The test target/reference file was created as a direct subset from the same file Train was created so everything should follow similar formatting.

If you'd like you can recreate the Zindi data prep by working just with the Train file provided and see if you get similar scores to the leaderboard.

User avatar
skaak
Ferra Solutions

Thanks amy!

Hmmmm - it does make a difference. I just subbed two files. The same content, but different formatting. The one looks something like this

xxxx,6.830256795307339

and the other follows Train formatting

xxxx,"6,830.256795307339"

and there is a *small* difference in score.

User avatar
Amy_Bray
Zindi

Aaah, this is frustrating! It means we (I) didn't check the type of the target which is meant to be float, not string.

We are working on a fix. Everyone's score should scale by 1000 now. I will make a discussion post when the update is done.

Thank you for being persistent with this!

User avatar
skaak
Ferra Solutions

No prob amy, of course, we all make mistakes ... that no matter, bounce back is what matters!

Thanks for discussion post, I think that will really be great to clarify to all.

Perhaps scoring system is also partly to blame - it should have rejected the funny format, how does it know to score correctly?

Anyhow, thanks amy ... look forward to the post

After reading these posts, including "Updated reference file: back to basics". I am still unsure of the format needed for the DECLARED_NET_INCOME in the submission.

If my ML predicts that a customer's DECLARED_NET_INCOME is R1 234,56 then which of these is a correct submission?:

a) 1234.56

b) 1.23456

c) other?

8 Jan 2023, 19:22
Upvotes 0
User avatar
Pieterkii

The answer is a. At least I hope.

User avatar
Amy_Bray
Zindi

A is the correct answer.