Primary competition visual

Kenya Clinical Reasoning Challenge

Helping Kenya
$10 000 USD
Completed (8 months ago)
Prediction
Natural Language Processing
SLM
1664 joined
440 active
Starti
Apr 03, 25
Closei
Jun 29, 25
Reveali
Jun 30, 25
User avatar
Zambia_Kuchalo
Typaflow Software Systems Limited
Potential Data Inconsistencies in Kenya Clinical Reasoning Challenge Dataset
Data · 4 Jun 2025, 13:50 · 11

Hello Zindians,

I hope all is well with everyone.

I have manually reviewed all 400 “Prompt–Clinician” examples. I noticed some inconsistencies in the data that could affect model performance. I’m sharing these findings in case they are unintentional errors or part of the challenge design and would appreciate any guidance you can provide.

Summary of Findings

Out of the 400 examples, I identified the following types of issues:

1. Context Mismatches (9 examples) Example indices: 26, 137, 148, 267, 318, 332, 352, 363, 366 In these cases, the “Prompt” appears to describe one clinical condition, while the “Clinician” response addresses a different condition.

2. Age Mismatches / Misalignments (13 examples) Example indices: 4, 14, 69, 72, 81, 104, 115, 117, 205, 303, 359, 392, 397 Here, the age mentioned in the “Prompt” (e.g., “a 5‑year‑old child”) does not match the age stated or implied in the “Clinician” response.

3. Spelling Errors (10 examples) Example indices: 185, 189, 196, 217, 234, 280, 284, 286, 289, 301 These entries contain typos or misspelled medical terms that could potentially impact tokenization or keyword matching. e.g poison - prison

4. Day Mismatch (1 example) Example index: 295 In this case, the “Prompt” refers to symptom onset “2 days ago,” but the “Clinician” response treats it as “2 weeks ago” (or another time frame), leading to a temporal inconsistency.

Why This Matters

1. Model Training Quality: Mismatches between “Prompt” and “Clinician” labels can confuse supervised learning—models may learn incorrect associations (e.g., treating a 5‑year‑old case as if it were a 6‑year‑old).

2. Evaluation Impact: If a model predicts based on the (incorrect) label, leaderboard scores may not accurately reflect true performance.

3. Challenge Intent: I’m not certain whether these inconsistencies are intentionally included (to test robustness) or represent genuine data errors. Clarification would help me— and other participants—interpret results correctly.

Request for Guidance

1. Data Verification: Would it be possible for the Zindi Team (or data curators) to confirm whether these are expected anomalies or genuine typos/mismatches?

2. Recommended Approach: If some of these examples are indeed erroneous, should we:

  • Exclude them from training/validation?
  • Manually correct them (e.g., adjust ages, fix spellings, align contexts) and use our “cleaned” version for local experimentation?
  • Treat them as a “challenge within a challenge” and leave them as is, knowing that robustness to label noise is part of the evaluation?

3. Official Errata or Updates: If there is an errata sheet or plan to release an updated dataset with corrections, could you please let us know where to find it and when it might be available?

Example Illustrations

Below are illustrative cases to show precisely what I mean:

Seeking guidance. Please let me know how best to proceed.

Thanks in advance

GoodDay

Discussion 11 answers
User avatar
Zambia_Kuchalo
Typaflow Software Systems Limited

Amy_Bray

Seeking guidance please.

4 Jun 2025, 14:01
Upvotes 0
User avatar
stefan027

This is amazing work @Zambia_Kuchalo!

I have also noticed the context mismatches - those seem particularly concerning since we're working with a small, expert-annotated dataset - but I haven't gotten around to quantifying it. Even if I had, there is no way my analysis would have been as detailed as yours! Thanks for sharing.

4 Jun 2025, 14:48
Upvotes 2
User avatar
Zambia_Kuchalo
Typaflow Software Systems Limited

Thank you

for 4 places, I used only Prompt

4 Jun 2025, 14:53
Upvotes 1

do u have an inference time less than 100ms per vignette?

User avatar
nymfree

Great analysis ba Zambia

4 Jun 2025, 15:00
Upvotes 1
User avatar
Amy_Bray
Zindi

Hmm, this is interest, could you please provide a list of the indices along with the IDs so I can review further.

5 Jun 2025, 08:20
Upvotes 0
User avatar
Zambia_Kuchalo
Typaflow Software Systems Limited

Noted with thanks, Let me do so;

Please review the data here link

Tip: use ready-made models that meet the conditions of the competition.

5 Jun 2025, 19:35
Upvotes 0
User avatar
mail_liw

Like??

Really helpful thanks!!!

9 Jun 2025, 14:48
Upvotes 0