Nurses, doctors and researchers in Malawi all contribute to disease surveillance. Malawi follows the World Health Organization’s (WHO) Integrated Disease Surveillance and Response (IDSR) strategy. Through this strategy, governments are better equipped to respond quickly to public health problems like epidemics.
AI Lab at the Malawi University of Business and Applied Sciences, in partnership with the Public Health Institute of Malawi, are researching how to improve these systems using large language models (LLMs).
The purpose of this challenge is to build an AI assistant capable of providing knowledge contained in the Malawi Technical Guidelines for Integrated Disease Surveillance and Response (TGs for IDSR).
You will train an open-source LLM to answer context-specific questions about Malawian public health processes, case definitions and guidelines, with training done on a dataset derived from the Malawi TGs for IDSR.
The final models developed in this challenge will improve on the prototype IntelSurv app, currently being developed by AI Lab. The solution will contribute to an interactive and adaptive training resource for health professionals to enhance their skills, receive real-time guidance on data collection, and stay updated on evolving practices.
This is a complex project with a lot of relevant information. Please be sure to read the full project description and details under the ‘Additional information’ heading below.
AI Lab at the Malawi University of Business and Applied Sciences (kuyeseraai.github.io/kai)
AI Lab is based at the Malawi University of Business and Applied Sciences. The lab was established in 2020 by Dr Amelia Taylor. The mission of the AI Lab is to facilitate debate, discussions, to create a channel for exchange of ideas, to foster innovation and to bring together those engaged with exploring or actively using AI in Malawi (and beyond). We aim to test, experiment, and develop AI solutions suitable for real problems in Malawi.
This competition will be evaluated in two phases:
For this competition, we have placed a number of criteria and limitations on your submissions. Please read the following requirements carefully:
You will be evaluated using a machine translation metric, which will specifically test your models' abilities to deliver a concise sentence that matches a given prompt, as well as generating the correct entities in the answer.
The error metric for this competition is the F-measure of Rouge-1-gram.
The Recall-Oriented Understudy for Gisting Evaluation (ROUGE) scoring algorithm calculates the similarity between a candidate document and a collection of reference documents. Normally, we use the ROUGE score to evaluate the quality of document translation and summarisation models. You can read more about ROUGE here.
For every row in the dataset, your submission file should contain 2 columns: ID and Text.
For each answer you return for ID_x_answer you will need to provide the answer as well as extract verbs and nouns and fill them in in the order they appear in your Text in ID_x_answer.
The question would be:
Q1: What is public health surveillance?Your submission file should look like this:
ID Target
Q1_Question Answer Public Health Surveillance involves systematic...
Q1_Reference Document TG Booklet 1
Q1_Paragraph(s) Number 72
Q1_Keywords Public Health Surveillance, Identification, Co...
This will be a preliminary evaluation, with the top 10 models going into a second round of qualitative evaluation for the use case.
In the second scoring phase, the top 10 models will be evaluated against a scoring rubric (set out below).
The test data for Phase 2 will consist of a gold-standard dataset containing pairs of questions and answers that have been checked by human experts in question formulation, disease surveillance and public health. The golden dataset will contain new (unseen) questions and answers. Participating solutions are required to provide answers, which will in turn be assessed by the experts and fed back to the datasets, together with updated questions.
In addition, the models will be tested against new, related documents to demonstrate their ability to answer unknown questions.
When evaluating answers, the definition of precision and recall will consider partial overlaps between answers and will take into account the presence or absence of keywords that are expected to be found in the ‘golden answer’. We will also take into account the overall correctness, and the ‘fluency’ of the answer.
Phase 2 scoring rubric:
1st place: $1 000 USD
2nd place: $600 USD
3rd place: $400 USD
There are 5 000 Zindi points available. You can read more about Zindi points here.
The top female or majority female team will receive an additional 1 000 Zindi points.
The challenge host has a discretionary award for one participant to travel to Malawi to visit the research lab, based on their submission to this competition. This will be confirmed after winners have been announced.
Competition closes on 3 March 2024.
Final submissions must be received by 11:59 PM GMT.
We reserve the right to update the contest timeline if necessary.
This is a complex challenge, and participants will benefit from a deeper understanding of the project and the challenges faced in the Malawi public health system.
In Malawi, environmental surveillance officers, health surveillance assistants, clinicians, laboratory technicians and community volunteers are all involved in collecting data for disease surveillance. Malawi adheres to the 3rd Edition of the World Health Organization's Integrated Disease Surveillance and Response (IDSR) strategy. This strategy, in alignment with WHO guidelines, aids African nations in establishing comprehensive public health surveillance and response systems. These systems address priority diseases, conditions, and events across all levels of health systems by linking surveillance, laboratory data, and other information with targeted public health interventions.
The Malawi IDSR strategy is outlined in the "Technical Guidelines for Integrated Disease Surveillance and Response in Malawi" (TGs for IDSR), a document divided into six booklets, each housing specific modules, totaling eleven in all. Complementary guidelines and manuals, such as the Community-based Surveillance (CBS) Training Manual in Malawi, are employed in conjunction with the TG. These additional resources may include disease-specific or outbreak-specific guidelines and case definitions, for instance for COVID-19.
The dataset and this competition is part of a project entitled “An Intelligent Disease Surveillance Data Feedback System” who was a winner of a Gradnd Challenges grant for Catalyzing Equitable Artificial Intelligence (AI) Use funded by the Bill & Melinda Gates Foundation.
The motivation for the project comes from an in-depth study that was conducted by the project team to assess the experiences of health professionals involved in collecting data in Lilongwe and Blantyre. We found that two critical factors stopped health practitioners using data collected during routine surveillance:
Building on the insights gained from our study and supported by funding from the Bill & Melinda Gates Foundation, we have initiated the development of a tool aimed at empowering health professionals engaged in disease surveillance in Malawi. IntelSurv, an abbreviation for Intelligent Surveillance, serves as a comprehensive solution designed to enhance the efficiency and accuracy of disease surveillance data collection. IntelSurv operates as a knowledge repository, capturing vital information related to disease surveillance in Malawi. This includes a wealth of knowledge on:
A distinctive feature of IntelSurv is its dynamic feedback mechanism. The app facilitates a structured exchange of questions and answers between health professionals actively involved in disease surveillance and national health authorities in Malawi. This interactive process ensures continuous improvement and alignment with evolving best practices.
Currently, IntelSurv is in the deployment phase, undergoing rigorous testing and validation with the active participation of health professionals.
Read more: https://www.intelsurv.com/datasets
In the development phase of IntelSurv, we have developed custom datasets of questions and answers specifically tailored for public health and disease surveillance encompassing a spectrum of questions and answers vital to the field. This dataset is tailored to address the specific queries health professionals commonly encounter during disease surveillance activities. It includes inquiries related to how to use forms, clarification on abbreviations found in data collection forms, application of clinical information, clinical case definitions, best practices for gathering specific data, and correct data formats for recording various types of information.
One of the datasets we developed forms the data for this competition and serves as a foundational resource to train an intelligent Language Model (LLM)-based agent that is able to extract responses to the nuanced challenges and inquiries pertinent to disease surveillance in Malawi.
The purpose of this challenge is to build an AI assistant for disease surveillance in Malawi capable of providing knowledge contained in the Malawi Technical Guidelines for Integrated Disease Surveillance and Response (TG).
You will train an open-source LLM to answer context-specific questions about Malawian public health processes, case definitions and guidelines, with training done on the Malawi TGs for IDSR.
By leveraging the LLM agent and the dataset, the solution has the potential to contribute to an interactive and adaptive training resource. Health professionals can engage with the agent to enhance their skills, receive real-time guidance on data collection, and stay updated on evolving practices. This interactive training approach ensures a continuous learning process, aligning professionals with the latest standards and protocols.
The Malawi Technical Guidelines for Integrated Disease Surveillance and Response Technical Guidelines. Lilongwe, December 2020. Licence: CC BY-NC-SA 3.0 IGO Some rights reserved. This work is available under the Creative Commons Attribution-NonCommercialShareAlike 3.0 IGO licence (CC BY-NC-SA 3.0 IGO; https://creativecommons.org/licenses/by-ncsa/3.0/igo).
How to enroll in your first Zindi competition
How to create a team on Zindi
How to update your profile on Zindi
How to use Colab on Zindi
How to mount a drive on Colab
This challenge is open to all.
Teams and collaboration
You may participate in competitions as an individual or in a team of up to four people. When creating a team, the team must have a total submission count less than or equal to the maximum allowable submissions as of the formation date. A team will be allowed the maximum number of submissions for the competition, minus the total number of submissions among team members at team formation. Prizes are transferred only to the individual players or to the team leader.
Multiple accounts per user are not permitted, and neither is collaboration or membership across multiple teams. Individuals and their submissions originating from multiple accounts will be immediately disqualified from the platform.
Code must not be shared privately outside of a team. Any code that is shared, must be made available to all competition participants through the platform. (i.e. on the discussion boards).
The Zindi data scientist who sets up a team is the default Team Leader but they can transfer leadership to another data scientist on the team. The Team Leader can invite other data scientists to their team. Invited data scientists can accept or reject invitations. Until a second data scientist accepts an invitation to join a team, the data scientist who initiated a team remains an individual on the leaderboard. No additional members may be added to teams within the final 5 days of the competition or last hour of a hackathon.
The team leader can initiate a merge with another team. Only the team leader of the second team can accept the invite. The default team leader is the leader from the team who initiated the invite. Teams can only merge if the total number of members is less than or equal to the maximum team size of the competition.
A team can be disbanded if it has not yet made a submission. Once a submission is made individual members cannot leave the team.
All members in the team receive points associated with their ranking in the competition and there is no split or division of the points between team members.
Datasets and packages
The solution must use publicly-available, open-source packages only.
You may use pretrained models as long as they are openly available to everyone.
The data for this competition is under the license CC BY-NC-ND 4.0 Legal Code | Attribution-NonCommercial-NoDerivs 4.0 International | Creative Commons.
Reference: Taylor, A. (2024). MWIDSRQA1.0 Malawi Integrated Disease Surveillance and Response Questions and Answers dataset (1.0) [Data set]. Kuyesera AI Lab, Malawi University of Business and Applied Sciences. https://doi.org/10.5281/zenodo.10565937.
You must notify Zindi immediately upon learning of any unauthorised transmission of or unauthorised access to the competition data, and work with Zindi to rectify any unauthorised transmission or access.
Your solution must not infringe the rights of any third party and you must be legally entitled to assign ownership of all rights of copyright in and to the winning solution code to Zindi.
Submissions and winning
You may make a maximum of 10 submissions per day.
You may make a maximum of 300 submissions for this competition.
Before the end of the competition you need to choose 2 submissions to be judged on for the private leaderboard. If you do not make a selection your 2 best public leaderboard submissions will be used to score on the private leaderboard.
During the competition, your best public score will be displayed regardless of the submissions you have selected. When the competition closes your best private score out of the 2 selected submissions will be displayed.
Zindi maintains a public leaderboard and a private leaderboard for each competition. The Public Leaderboard includes approximately 20% of the test dataset. While the competition is open, the Public Leaderboard will rank the submitted solutions by the accuracy score they achieve. Upon close of the competition, the Private Leaderboard, which covers the other 80% of the test dataset, will be made public and will constitute the final ranking for the competition.
Note that to count, your submission must first pass processing. If your submission fails during the processing step, it will not be counted and not receive a score; nor will it count against your daily submission limit. If you encounter problems with your submission file, your best course of action is to ask for advice on the Competition’s discussion forum.
If you are in the top 10 at the time the leaderboard closes, we will email you to request your code. On receipt of email, you will have 48 hours to respond and submit your code following the Reproducibility of submitted code guidelines detailed below. Failure to respond will result in disqualification.
If your solution places 1st, 2nd, or 3rd on the final leaderboard, you will be required to submit your winning solution code to us for verification, and you thereby agree to assign all worldwide rights of copyright in and to such winning solution to Zindi.
If two solutions earn identical scores on the leaderboard, the tiebreaker will be the date and time in which the submission was made (the earlier solution will win).
The winners will be paid via bank transfer, PayPal if payment is less than or equivalent to $100, or other international money transfer platform. International transfer fees will be deducted from the total prize amount, unless the prize money is under $500, in which case the international transfer fees will be covered by Zindi. In all cases, the winners are responsible for any other fees applied by their own bank or other institution for receiving the prize money. All taxes imposed on prizes are the sole responsibility of the winners. The top winners or team leaders will be required to present Zindi with proof of identification, proof of residence and a letter from your bank confirming your banking details. Winners will be paid in USD or the currency of the competition. If your account cannot receive US Dollars or the currency of the competition then your bank will need to provide proof of this and Zindi will try to accommodate this.
Please note that due to the ongoing Russia-Ukraine conflict, we are not currently able to make prize payments to winners located in Russia. We apologise for any inconvenience that may cause, and will handle any issues that arise on a case-by-case basis.
Payment will be made after code review and sealing the leaderboard.
You acknowledge and agree that Zindi may, without any obligation to do so, remove or disqualify an individual, team, or account if Zindi believes that such individual, team, or account is in violation of these rules. Entry into this competition constitutes your acceptance of these official competition rules.
Zindi is committed to providing solutions of value to our clients and partners. To this end, we reserve the right to disqualify your submission on the grounds of usability or value. This includes but is not limited to the use of data leaks or any other practices that we deem to compromise the inherent value of your solution.
Zindi also reserves the right to disqualify you and/or your submissions from any competition if we believe that you violated the rules or violated the spirit of the competition or the platform in any other way. The disqualifications are irrespective of your position on the leaderboard and completely at the discretion of Zindi.
Please refer to the FAQs and Terms of Use for additional rules that may apply to this competition. We reserve the right to update these rules at any time.
A README markdown file is required
It should cover:
Your code needs to run properly, code reviewers do not have time to debug code. If code does not run easily you will be bumped down the leaderboard.
Consequences of breaking any rules of the competition or submission guidelines:
Monitoring of submissions
Join the largest network for
data scientists and AI builders