GIZ AI4D Africa Language Challenge - Round 2
$6,000 USD
Calling on the Zindi community to help uncover and create African Language Datasets for improved representation in the field of NLP
401 data scientists enrolled
ResearchCollectionUnstructuredTextNLP
1 June—2 August
63 days

Note that this competition has been updated on 1 June 2020 with a new round of prizes specifically for languages indigenous to Uganda, Ghana, and South Africa.

In recent times, pre-trained language models have led to significant improvement in various Natural Language Processing (NLP) tasks and transfer learning is rapidly changing the field. Transfer Learning is the process of training a model on a large-scale dataset and then using that pre-trained model to conduct learning for another downstream task (i.e. a target task like name entity recognition).

Among leading architectures for pre-training models for transfer learning in NLP, pre-trained models in African languages are barely represented mainly due to a lack of data. (However, there are some examples, for example this multilingual BERT that includes likes like Swahili and Yoruba.) While these architectures are freely available for use, most are data-hungry. The GPT-2 model, for instance, used millions, possibly billions of text to train. (ref)

This gap exists due to a lack of availability of data for African languages on the Internet. The languages selected for BERT pre-training “were chosen because they are the top languages with the largest Wikipedias”. (ref) Similarly, the 157 pre-trained language models made available by fastText were trained on Wikipedia and Common Crawl. (ref)

Therefore, this challenge's objective is the creation, curation and collation of good quality African language datasets for a specific NLP task. This task-specific NLP dataset will serve as the downstream task we can evaluate future language models on.

This challenge hosted in partnership with GIZ and the FAIR Forward initiative and the Artificial Intelligence for Development Africa(AI4D-Africa) Network.

About FAIR Forward and GIZ (toolkit-digitalisierung.de/en/fair-forward)

The “FAIR Forward – Artificial Intelligence for all” initiative promotes a more open, inclusive and sustainable approach to AI on an international level. It is implemented by the Deutsche Gesellschaft für Internationale Zusammenarbeit (GIZ) on behalf of the German Federal Ministry for Economic Cooperation and Development (BMZ). FAIR Forward seeks to improve the foundations for AI innovation and policy in five partner countries: Rwanda, Uganda, Ghana, South Africa and India. Together with our partners, we focus on three areas of action: (1) strengthen local technical know-how on AI, (2) increase access to open AI training data, (3) develop policy frameworks ready for AI.

About AI4D-Africa; Artificial Intelligence for Development-Africa Network (ai4d.ai)

AI4D-Africa is a network of excellence in AI in sub-Saharan Africa. It is aimed at strengthening and developing community, scientific and technological excellence in a range of AI-related areas. It is composed of African Artificial Intelligence researchers, practitioners and policy makers.