The training dataset contains maternal, sexual and reproductive health (MSRH) question-and-answer pairs across four African languages – Akan, Amharic, Luganda and Swahili and the English language – spanning nine language-country configurations. It comprises approximately 29,815 training records and 6,686 validation records. It is suitable for sequence-to-sequence tasks such as health question answering and text generation in low-resource African languages.
The test dataset follows the same structure. It consists of 2,618 records in total. Unlike the training data, this dataset contains only the input health questions. Participants must use their trained model to generate the corresponding answers, which will be evaluated against the reference answers.