Primary competition visual

Zindi New User Engagement Prediction Challenge

Helping Africa
$5 000 USD
Completed (~3 years ago)
Prediction
1270 joined
222 active
Starti
Oct 14, 22
Closei
Feb 12, 23
Reveali
Feb 12, 23
Confirmed: Error in score calculations
Help · 13 Feb 2023, 08:50 · 11

Confirmed that the predictions can be 'cheated' by simply adding duplicates of the same member.

All i did here was simply add 100,000 duplicates of one member and assigned them a 1 and scores a 0.99 score!!!

@zindi please can the leaderboard calculation be fixed? Thanks to @Koleshjr for pointing this out.

Edit:

I don't think anyone intentionally abused this, else they could have got 0.99 score as shown here. However, I think it would be unfair to reward people who have errors in their pipeline introducing duplicate IDs so the leaderboard calculation should be re-ran.

Discussion 11 answers

I suggest running this code before calculating F1_score

sub.groupby("User_ID_Next_month_Activity").first().reset_index()

13 Feb 2023, 09:01
Upvotes 1

I said that last week and no answer.

the best solution for Zindi is to repair the eval process and postpone the competition !

13 Feb 2023, 11:03
Upvotes 2

I think repair the eval process - no need to postpone.

Anyone following proper DS practices (Cross validation) will not have an issue.

It is a shame it didn't come out earlier, but that is OK :D

yeah u r right ! but when u know that f1score should be the same for submission with duplicates and without duplicates the lbscore can make some confusion

if duplication of correct answers improves your results then you could re-engineer true cases and it is bad. Postponing and recalculation of lb won't help

13 Feb 2023, 14:16
Upvotes 1

It will, because people don't have access to the private LB to do this re-eingineering.

1) hope Zindi has private test data, but not sure

2) If u get the right answers from the public part it will give u a great boost for the training model (in case of random choosing private and public)

User avatar
Siwar_NASRI

seeing the private and public lb, I don't think they used a private test set, and in the public set there were 197 positive samples vs. 1143 negatives, if someone is not honest with himself, he can find them easily, in this case it will be a clear cheat using the same data set.

User avatar
Amy_Bray
Zindi

Hello, we are looking in to this and will give feedback this week.

25 Feb 2023, 05:30
Upvotes 1

Thank you for the update :D !

User avatar
Siwar_NASRI

It is this flexibility that makes me prefer Zindi, thanks @amyflorida626