GIZ AI4D Africa Language Challenge - Round 2
$6,000 USD
Calling on the Zindi community to help uncover and create African Language Datasets for improved representation in the field of NLP
363 data scientists enrolled
ResearchCollectionUnstructuredTextNLP
1 June—2 August
Ends in 21 days

There is no data for this competition.

You can download:

  • AI4D_Documentation.docx - This contains guidelines for your documentation. It includes questions on motivation, composition, collection process, recommended uses, and so on.

When you make a submission you will need to submit a data set in the same format as AI4D_Data.txt, along with the dataset's documentation. The documentation should answer the questions in AI4D_Documentation.docx. This challenge calls on you to submit African language datasets (annotated or otherwise) that are representative and balanced and useful for downstream NLP tasks.For your submission to be eligible, the data must meet the following criteria:

  • The languages must be indigenous to Uganda, Ghana, or South Africa
  • Data should be sentence split and not tokenized
  • Each dataset submission must be accompanied by a dataset documentation.
  • The documentation covers the motivation, composition, collection process, recommended uses, and so on. See this paper for further details. A report template is provided.
  • While you can adapt the sections covered in the documentation, you must include the final section- an explanation on how you would expand this data set if you won the $1,500 research grant.
  • Our intention is that the datasets are kept free and open for public use under a Creative Commons 4.0 license or similar. Data already licensed under more restrictive terms will not be eligible

You must upload your submission to the competition one file at a time. You must include documentation for each submission that describes the submitted dataset. Note that there will be no scores on this leaderboard. If you make multiple submissions, each of your submissions will be judged independently of each other. Up to three submissions from an individual or team will be considered. It is possible for someone to win multiple prizes.

You should provide two files for each submission:

  • ONE txt file with the language data (or multiple files in the case of multilingual datasets)
  • ONE pdf file accompanying the datasheet that documents its motivation, composition, collection process, recommended uses, and so on. See this paper for further details.

Please label your files:

username_data_XXX.txt
username_documentation_XXX.pdf

Where XXX is a unique ID to indicate which datasheet goes with which documentation if you make multiple submissions. Note that you can also zip the files.