Primary competition visual

DigiCow Farmer Training Adoption Challenge

Helping Kenya
€8 250 EUR
Under code review
Data analysis
Classification
895 joined
388 active
Starti
Jan 28, 26
Closei
Mar 01, 26
Reveali
Mar 02, 26
Duplicates
Data · 6 Feb 2026, 16:04 · 2

I noticed that there are duplicate records in both the training and test datasets. While duplicates in the training set are straightforward to handle by removing them, seeing duplicates in the test set is more surprising. These duplicates appear when the ID column is not included. Do these duplicates exist by design, or were they unintentionally introduced during data preparation?

Discussion 2 answers

Data Conflict: Multiple records appear identical (differing only by ID) yet show contradictory adopted_within_07_days or adopted_within_90_days or adopted_within_120_days statuses.

Clarification: Does "adoption within x days" refer to specific topics trained on being adopted, rather than a one-time milestone triggered by the initial training session?

6 Feb 2026, 17:22
Upvotes 1

Yes that's a fantastic observation, and its true..... I did a time sensitive train validation split..... and I saw duplicates as follows : a) Train: 4054 / 7163 b) Val: 4098 / 6373 c) Test: 4322 / 5621