🩺 AI in Focus: My model is unable to generate...

Kenya Clinical Reasoning Challenge

Helping Kenya

$10 000 USD

Completed (~1 year ago)

Skills you will learn

Prediction

Natural Language Processing

SLM

1672 joined

439 active

Info Data Chat Leaderboard

Start

Apr 03, 25

Jun 29, 25

Reveal

Jun 30, 25

mrfanyntom

My model is unable to generate stuff properly

Help · 26 Jun 2025, 21:54 · 6

my model can generate summary but not diagnose and plan properly, anybody have tip about how to do that? please help me

Discussion 6 answers

Chizurum_Olorondu

University of lagos

1) Model choice: try using models that are instruction-tuned. They are more likely to adapt to the complex task. They fall into 2 categories on hugging face ( text-to-text / seq2seq and Text generation/ Causal models.

eg Causal models like Qwen/Qwen2.5-0.5B-Instruct (look out for models with the word "instruct" in their names") or T5 models like google/Flan-T5-base (or the small/large variant).

2) Craft a prompt that is precise and tells your model to summarize the clinical scenario and answer the given questions. Eg. if you are using Flan-T5, you can start with this and modify it as you go.

"You are a medical expert with advanced knowledge in clinical reasoning, diagnostics, and treatment planning. Please answer the following medical question. Give the summary of the question before answering." + EOS_Token (important to tell your model that this is the end of the prompt)

3) Use examples: This can help to teach your model the format to present its generated text and also provide context which helps us to teach the model to stick only the answers listed in the clinician column (ground truth).

Basic method: if you are using t5 models (small content window of 512 tokens), use a tokenizer (any one) to find the shortest row (prompt + clinician), concatenate it to your prompt column and use that as the new prompt to teach your model the format of "medical summary of clinical scenario + answer to questions raised". Look up One-shot, few-shot and zero-shot prompting. a basic template for Flan-T5 looks like this:

"""Your instruction........

Q: example prompt \nA: example clinician response \n\nQ: your prompt \nA:"""

Advanced method: test out any of the strategies employed in building a Retrieval Augmented Generation (RAG) pipeline.

I used this method:

a) Embed prompts into a vector space with an embedded model (eg from sentence transformers)

-b) Use "hnswlib" or "faiss"(set "cpu" as device if you dont want to overload your gpu) python libraries to build an index of the question (L2 or Cosine similarity). you would use this to build an index for the prompts based on their semantic meaning (custom vector store/db) and also for the retrieval of meaningful examples

-c) Build a retrieval function that fetches you the top K most similar clinician responses for each prompt (k=3 is a good starting point)

d) Concatenate the top 3 responses to your prompt column and use them as examples for your new prompt. it would look something like this:

You are Qwen, created by Alibaba Cloud. You are a helpful assistant. You have advanced knowledge in clinical reasoning, diagnostics, and treatment planning. Use the information contained in the examples below as a reference to answer the given question. Avoid all conversational prose. Examples: a 15 year old female presents with burn wound after an accident while cooking. the wound has redness and blisters covering the arm posteriorly. vitals bp 110/60, other vitals are within normal ranges management of the patient admit give iv fluids(4mls/kg/ total burn surface area) to prevent shock give analgesics do debridement of wounds - mechanical debridement using gauze give antibiotics apply topical antibiotics and topical antifungals on wound give anti-parasitics give antihistamines give hematinics deworm your patient refer to a plastic surgeon for further management differential diagnosis thermal burns drug allergy chemical burns 1 -year-old male presents with burns from hot water. skin broken with blisters. dx: second-degree burn. management: ensure airway is clear. monitor breathing and spo . administer fluid resuscitation and pain relief. irrigate burn areas with saline. dress burn areas with sterile dressing. apply silver sulfadiazine. provide tetanus prophylaxis. 9 yrs old girl was brought to the emergency department with a burn injury on the left palm. upon examination:she is withdrawn, dehydrated with 2nd degree burns on palms and other old injuries. mother discloses that child has been subjected to abuse by stepfather. problems * dehydration * suspicion of ongoing abuse * physical injuries and 2nd degree burns on the left hand. * psychological impact. management * clean and dress wounds with sterile techniques. * use topical antibiotics eg silver sulfadiazine for infection prevention * administer analgesics for pain relief * administer tetanus toxoid if wound contaminated * administer iv fluids to rehydrate the child * immediate referral of child to a psychologist to assess psychological impact of abuse from stepfather * continuous monitoring of vital signs * ensure that child and mother are kept safe and involve social worker investigations * complete blood count - check for signs of infection * burn wound culture - to identify any bacterial infection * urea, electrolytes and creatinines - to check if kidney is compromised * x-ray of limbs, chest and pelvis - to check for any hidden fracture preferred diagnosis __ child abuse - history of previous scars: ### Question: a 4 year old child presents to the emergency department with second degree burns on the forearm after accidentally touching a hot stove. the child was playing in the kitchen when they reached out to touch the stove. the burns cover about 5 % of the total body surface area. the child is alert and crying with redness blisters and swelling on the affected area. the burns appear to be superficial to moderate in severity. the child is in mild pain and there is no indication of airway or breathing distress. no other injuries are noted. 1. what is the immediate treatment protocol for second degree burns in paediatric patients. 2. should any tetanus prophylaxis be considered in this case. 3. what follow up care should be recommended for burn healing. ### Answer:

[with this you can train the model to use the examples provided as a reference from which it can get answers to the questions asked thus sticking to the ground truth....ie better rougeL score.]

Other Tips:

* Try using SFTTrainers, GRPO Trainers from the "trl" python library to finetune your model

* Tweak your generation config to skip special tokens, clean white spaces and avoid hallucinations

* Use Peft, Lora, bitsandbytes libraries for finetuning and quantizing your models for efficiency

* Use the "nlpaug" python library for data augmentation

* Use batch size of 1 if you get an "out of memory error" on your GPU and increase gradually.

* Seed everything.

Sadly, rouge score is more for summarization tasks and is poorly suited for the comp, the clinician column is likely generated from a finetuned model trained on the "actual clinicians' response". We are basically modelling another model's response as ground truth which is constructive for a teaching purpose and can applied to the real world to be fair. 2GB ram limit though and under 100ms per item isn't really ideal for RAGs techniques which would be ideal for the ground truth concept. BERTSCORE would have been a better metric.

Cheers

30 Jun 2025, 02:24

Upvotes 0

Chizurum_Olorondu

University of lagos

pardon the typos

replied to Chizurum_Olorondu30 Jun 2025, 02:27

Upvotes 0

mrfanyntom

Thank you so much, well I was not expecting to get secret ingredients while hackathon but, thanks a lot you pay attention and answer me, this will help me next time.

replied to Chizurum_Olorondu30 Jun 2025, 04:52

Upvotes 0

Chizurum_Olorondu

University of lagos

no problem

replied to mrfanyntom30 Jun 2025, 06:32

Upvotes 0

duongkstn

Nice approach @Chizurum_Olorondu . I just want to ask in details:

About your Reinforcement learning, so basically, with each question, and answer from clinician, you collect about k other answers from dataset using vector-embedding retrieval, right ? And with (question, ground truth answer from clinician and k other answers), you tune LLM using GRPO ?

Do you use GPT4.0, LLama column as 'k other answers' ?

Before RL tuning, Do you use SFT training using (question, ground truth answer from clinician) ?

replied to Chizurum_Olorondu4 Jul 2025, 08:11

Upvotes 0

Chizurum_Olorondu

University of lagos

Yes to your first question, that was one strategy I used, didn't get good enough results with this though, sft had a better rouge score. No to your second question, i only used the clinician column as those are the answers we were to be judged by. For your third question yes, I used SFT training but with examples + questions, that gave me better results. Sorry for the late response.

replied to duongkstn15 Jul 2025, 06:16

Upvotes 0

Join the largest network for
data scientists and AI builders

About FAQs

Status