Hello Zindians,
I hope all is well with everyone.
I have manually reviewed all 400 “Prompt–Clinician” examples. I noticed some inconsistencies in the data that could affect model performance. I’m sharing these findings in case they are unintentional errors or part of the challenge design and would appreciate any guidance you can provide.
Summary of Findings
Out of the 400 examples, I identified the following types of issues:
1. Context Mismatches (9 examples) Example indices: 26, 137, 148, 267, 318, 332, 352, 363, 366 In these cases, the “Prompt” appears to describe one clinical condition, while the “Clinician” response addresses a different condition.
2. Age Mismatches / Misalignments (13 examples) Example indices: 4, 14, 69, 72, 81, 104, 115, 117, 205, 303, 359, 392, 397 Here, the age mentioned in the “Prompt” (e.g., “a 5‑year‑old child”) does not match the age stated or implied in the “Clinician” response.
3. Spelling Errors (10 examples) Example indices: 185, 189, 196, 217, 234, 280, 284, 286, 289, 301 These entries contain typos or misspelled medical terms that could potentially impact tokenization or keyword matching. e.g poison - prison
4. Day Mismatch (1 example) Example index: 295 In this case, the “Prompt” refers to symptom onset “2 days ago,” but the “Clinician” response treats it as “2 weeks ago” (or another time frame), leading to a temporal inconsistency.
Why This Matters
1. Model Training Quality: Mismatches between “Prompt” and “Clinician” labels can confuse supervised learning—models may learn incorrect associations (e.g., treating a 5‑year‑old case as if it were a 6‑year‑old).
2. Evaluation Impact: If a model predicts based on the (incorrect) label, leaderboard scores may not accurately reflect true performance.
3. Challenge Intent: I’m not certain whether these inconsistencies are intentionally included (to test robustness) or represent genuine data errors. Clarification would help me— and other participants—interpret results correctly.
Request for Guidance
1. Data Verification: Would it be possible for the Zindi Team (or data curators) to confirm whether these are expected anomalies or genuine typos/mismatches?
2. Recommended Approach: If some of these examples are indeed erroneous, should we:
3. Official Errata or Updates: If there is an errata sheet or plan to release an updated dataset with corrections, could you please let us know where to find it and when it might be available?
Example Illustrations
Below are illustrative cases to show precisely what I mean:
Seeking guidance. Please let me know how best to proceed.
Thanks in advance
GoodDay
Amy_Bray
Seeking guidance please.
This is amazing work @Zambia_Kuchalo!
I have also noticed the context mismatches - those seem particularly concerning since we're working with a small, expert-annotated dataset - but I haven't gotten around to quantifying it. Even if I had, there is no way my analysis would have been as detailed as yours! Thanks for sharing.
Thank you
for 4 places, I used only Prompt
do u have an inference time less than 100ms per vignette?
Great analysis ba Zambia
Hmm, this is interest, could you please provide a list of the indices along with the IDs so I can review further.
Noted with thanks, Let me do so;
Please review the data here link
Tip: use ready-made models that meet the conditions of the competition.
Like??
Really helpful thanks!!!