GIZ Kinyarwanda Text Cleaning and Augmentation Competition 📊

GIZ Kinyarwanda Text Cleaning and Augmentation Competition by GIZ

Helping Rwanda

$3 500 USD

Completed (almost 4 years ago)

Skills you will learn

Collection

Natural Language Processing

136 joined

0 active

Info Data Chat Leaderboard

Start

Jul 07, 22

Aug 07, 22

Reveal

Aug 07, 22

About

Approximately 55 parallel Kinyarwanda-English sentences will be provided for data cleaning along. This data is to help you get started on this process.

You are tasked with finding additional data sources of parallel Kinyarwanda-English sentences. You need to clearly document where and how you downloaded the data, however it is preferable that you input the data straight into your script using an API.

GIZ is particularly interested in domain-specific text data from fields such as health, agriculture, tourism, etc,. We recommend that you focus on one field, you can even specialize in a specific subfield if you’d like. As quantity and quality of data will result in a strong model.

Please do not use the JW300 datasets as this may skew the distribution of data across fields.

The objective of this challenge is to create a script that will clean Kinyarwanda-English parallel sentences.

You are encouraged to use a rules-based approach along with machine learning if you think it is applicable. Remember to consider your script’s efficiency and memory usage during execution.

You are welcome to create a machine translation but it will not add to your final score.

Files

Description

Files

Join the largest network for
data scientists and AI builders

About FAQs

Status