AI4D-African Language Dataset Challenge
$5,000 USD
Calling on the Zindi community to help uncover and create African Language Datasets for improved representation in the field of NLP
271 data scientists enrolled
1 November 2019—1 April 2020

There is no data for this competition.

You can download:

  • AI4D_Documentation.docx - This contains guidelines for your documentation. It includes questions on motivation, composition, collection process, recommended uses, and so on.
  • AI4D_Datasheet.txt - This is an example of a data set. Each row is a new sentence.

When you make a submission you will need to submit a datasheet or data set in the same format as AI4D_Datasheet.txt, along with the datasets documentation. The documentation should answer the questions in AI4D_Documentation.docx.This challenge calls on you to submit African language datasets (annotated or otherwise) that are representative and balanced and useful for downstream NLP tasks.For your submission to be eligible, the data must meet the following criteria:

  • The language represented in the dataset must be an African language
  • Data should be sentence split and not tokenized
  • Each dataset submission must be accompanied by a datasheet that documents its motivation, composition, collection process, recommended uses, and so on. See this paper for further details.
  • Our intention is that the datasets are kept free and open for public use under a Creative Commons license. Data already licensed under more restrictive terms will not be eligible

You must upload your submission to the competition one file at a time. You must include documentation for each submission that describes the submitted dataset. Note that there will be no scores on this leaderboard. If you make multiple submissions, each of your submissions will be judged independently of one another. It is possible for someone or a team to win multiple prizes in one month and/or throughout the duration of this challenge. You should provide two files for each submission:

  • ONE txt file with the language data (or multiple files in the case of multilingual datasets)
  • ONE pdf file accompanying the datasheet that documents its motivation, composition, collection process, recommended uses, and so on. See this paper for further details.

Please label your files:

username_datasheet_XXX.txt
username_documentation_XXX.pdf

Where XXX is a unique ID to indicate which datasheet goes with which documentation if you make multiple submissions. Note that you can also zip the files.