The dataset for this competition was developed as part of a project conducted by the Kuyesera AI Lab (KAI Lab) at the Malawi University of Business and Applied Sciences (MUBAS) in partnership with the Public Health Institute of Malawi (PHIM).
The training data provided for this competition is drawn from six booklets representing sections of the Technical Guidelines for Disease Surveillance and Response (TGs for IDSR) in Malawi. These booklets are available in .docx, .pdf, and .xlsx formats, with the Excel files containing numbered paragraphs of the text. Images, figures, and charts are referenced by numbers in the Excel files but are stored in the main documents.
The dataset contains questions and answers, contextualized within the TG booklets. The questions come in various types, including what, why, who, where, and those seeking comparisons between concepts. Participants are free to incorporate other openly available Questions and Answers datasets to aid in the training and fine-tuning of their models. Examples of such datasets include SQuAD (The Stanford Question Answering Dataset) and eli5 datasets available at Hugging Face. However, the use of proprietary datasets or libraries is explicitly disallowed.
Please note that all acronynms and first letters of each word in the "Keywords" column are capatalised.