💬 Trending Now: Confirmed: Error in score calc...

Zindi New User Engagement Prediction Challenge

Helping Africa

$5 000 USD

Completed (over 3 years ago)

Skills you will learn

Prediction

1270 joined

220 active

Info Data Chat Leaderboard

Start

Oct 14, 22

Feb 12, 23

Reveal

Feb 12, 23

Confirmed: Error in score calculations

Help · 13 Feb 2023, 08:50 · 10

Confirmed that the predictions can be 'cheated' by simply adding duplicates of the same member.

All i did here was simply add 100,000 duplicates of one member and assigned them a 1 and scores a 0.99 score!!!

@zindi please can the leaderboard calculation be fixed? Thanks to @Koleshjr for pointing this out.

Edit:

I don't think anyone intentionally abused this, else they could have got 0.99 score as shown here. However, I think it would be unfair to reward people who have errors in their pipeline introducing duplicate IDs so the leaderboard calculation should be re-ran.

Discussion 10 answers

I suggest running this code before calculating F1_score

sub.groupby("User_ID_Next_month_Activity").first().reset_index()

13 Feb 2023, 09:01

Upvotes 1

Klai

I said that last week and no answer.

the best solution for Zindi is to repair the eval process and postpone the competition !

13 Feb 2023, 11:03

Upvotes 2

I think repair the eval process - no need to postpone.

Anyone following proper DS practices (Cross validation) will not have an issue.

It is a shame it didn't come out earlier, but that is OK :D

replied to Klai13 Feb 2023, 11:06

Upvotes 1

Klai

yeah u r right ! but when u know that f1score should be the same for submission with duplicates and without duplicates the lbscore can make some confusion

replied to FC13 Feb 2023, 11:44

Upvotes 0

Mzungu

if duplication of correct answers improves your results then you could re-engineer true cases and it is bad. Postponing and recalculation of lb won't help

13 Feb 2023, 14:16

Upvotes 1

It will, because people don't have access to the private LB to do this re-eingineering.

replied to Mzungu13 Feb 2023, 14:24

Upvotes 0

Mzungu

1) hope Zindi has private test data, but not sure

2) If u get the right answers from the public part it will give u a great boost for the training model (in case of random choosing private and public)

replied to FC13 Feb 2023, 14:48

Upvotes 0

Siwar_NASRI

seeing the private and public lb, I don't think they used a private test set, and in the public set there were 197 positive samples vs. 1143 negatives, if someone is not honest with himself, he can find them easily, in this case it will be a clear cheat using the same data set.

replied to Mzungu21 Feb 2023, 10:51

Upvotes 1

Thank you for the update :D !

25 Feb 2023, 08:16

Upvotes 0

Siwar_NASRI

It is this flexibility that makes me prefer Zindi, thanks @amyflorida626

25 Feb 2023, 11:31

Upvotes 0

Join the largest network for
data scientists and AI builders

About FAQs

Status