Confirmed that the predictions can be 'cheated' by simply adding duplicates of the same member.
All i did here was simply add 100,000 duplicates of one member and assigned them a 1 and scores a 0.99 score!!!
@zindi please can the leaderboard calculation be fixed? Thanks to @Koleshjr for pointing this out.
Edit:
I don't think anyone intentionally abused this, else they could have got 0.99 score as shown here. However, I think it would be unfair to reward people who have errors in their pipeline introducing duplicate IDs so the leaderboard calculation should be re-ran.
I suggest running this code before calculating F1_score
sub.groupby("User_ID_Next_month_Activity").first().reset_index()
I said that last week and no answer.
the best solution for Zindi is to repair the eval process and postpone the competition !
I think repair the eval process - no need to postpone.
Anyone following proper DS practices (Cross validation) will not have an issue.
It is a shame it didn't come out earlier, but that is OK :D
yeah u r right ! but when u know that f1score should be the same for submission with duplicates and without duplicates the lbscore can make some confusion
if duplication of correct answers improves your results then you could re-engineer true cases and it is bad. Postponing and recalculation of lb won't help
It will, because people don't have access to the private LB to do this re-eingineering.
1) hope Zindi has private test data, but not sure
2) If u get the right answers from the public part it will give u a great boost for the training model (in case of random choosing private and public)
seeing the private and public lb, I don't think they used a private test set, and in the public set there were 197 positive samples vs. 1143 negatives, if someone is not honest with himself, he can find them easily, in this case it will be a clear cheat using the same data set.
Hello, we are looking in to this and will give feedback this week.
Thank you for the update :D !
It is this flexibility that makes me prefer Zindi, thanks @amyflorida626