This is a parallel corpus dataset for machine translation from French to Dyula. It contains roughly 10929 French to Dyula parallel sentences. The corpus is an extension of the Comonvoice version 4 French corpus (https://arxiv.org/abs/1912.06670) to the Dyula language. Sentences collected in French were pre-processed and then manually translated into Dyula by a linguist. The collected sentences were preprocessed and aligned manually.
This dataset collection efforts have been supported by International Development Research Centre (IDRC) and Swedish International Development Cooperation Agency (SIDA), managed by African Center for Technology Studies (ACTS) in collaboration with the Université Virtuelle de Côte d'Ivoire (UVCI) and Data354 through the programme Artificial Intelligence for Development (AI4D) Africa.
You can access the dataset here: https://huggingface.co/datasets/uvci/Koumankan_mt_dyu_fr
The dataset is available on the Highwind platform. Please see this documentation: https://docs.highwind.ai/zindi/experiment/