AI4D Swahili News Classification Challenge 🎥

AI4D Swahili News Classification Challenge

Helping East Africa

$250 USD

Completed (~5 years ago)

Skills you will learn

Classification

77 joined

53 active

Info Data Leaderboard

Start

Feb 26, 21

Feb 28, 21

Reveal

Feb 28, 21

About

The dataset describes 31,024 rows of news from different sources (most are from Tanzania). These news are in 6 different news categories from national news to entertainment news.

Your goal is to accurately classify each Swahili news content into six specified categories below:

Kitaifa (National)
Kimataifa (International)
Uchumi (Business/Economy)
Afya (Health)
Michezo (Sports)
Burudani (Entertainment)

Variable definitions

id - This is the id of particular news
content - This is the content of particular news
category - This is a category for particular news among five categories identified.

The files for download

train.csv is the dataset that you will use to train your model. This dataset includes 23,268 randomly selected news headlines.
test.csv is the dataset to which you will apply your model to test how well it performs. Use your model and this dataset to predict in which of the five categories the content of the particular news will be categorized. The test set contains 7,756 news headlines. This dataset includes the same fields as train.csv except for the last column. Note that the target is category.
sample_submission.csv is an example of what your submission file should look like.
StarterNotebook.ipynb - this notebook will help you read in the data, build a simple model and make a submission on the leaderboard.

Files

Description

Files

Join the largest network for
data scientists and AI builders

About FAQs

Status