Swahili News Classification
Knowledge
Can you create a classification algorithm to identify Swahili news articles by category?
237 data scientists enrolled, 32 on the leaderboard
MediaClassificationNLPUnstructured
Tanzania
9 July 2020

The dataset describes 6439 rows of news from different sources in Tanzania.These news are in 5 different news categories from national news to entertainment news.

Your goal is to accurately classify each swahili news content into five specified categories below:

  • Kitaifa (National)
  • Kimataifa (International)
  • Biashara (Business)
  • Michezo (Sports)
  • Burudani (Entertainment)

The files for download are:

  • Train.csv is the dataset that you will use to train your model. This dataset includes 5,151 randomly selected news.
  • Test.csv is the dataset to which you will apply your model to test how well it performs. Use your model and this dataset to predict which of the five categories the content of the particular news will be categorized. The test set contains 1,288 news. This dataset includes the same fields as train.csv except for the last column. Note that the target is category.
  • SampleSubmission.csv is an example of what your submission file should look like.
  • VariableDefintions.csv provides definitions of the variables found in test.csv and train.csv