Google NLP Hack Series: Swahili Social Media Sentiment Analysis Challenge
Can you classify whether a tweet in Swahili is positive, negative, or neutral?
$1 000 USD
Ended 9 months ago
40 active ยท 63 enrolled
East Africa
Good for beginners

Zindi Africa with EA ambassadors gathered Swahili contents (tweets) from Twitter that express sentiment about popular topics. For this purpose, we extracted 3 000 tweets using Tweepy and Twitter APIs.

The data was preprocessed by removing links, emoji symbols, and punctuations.

The collected tweets were manually annotated using an overall polarity: positive (1), negative (-1) and neutral (0).

Variable definitions

  • ID - This is the unique ID of a unique Swahili tweet.
  • Sentence - This is the content of a unique tweet.
  • Label- This is a sentiment of a particular tweet (positive (1), negative (-1) and neutral (0).

Files available for download:

  • Train.csv - contains the label (target). This is the dataset that you will use to train your model.
  • Test.csv- resembles Train.csv but without the target-related columns. This is the dataset on which you will apply your model to.
  • SampleSubmission.csv - shows the submission format for this competition, with the ‘ID’ column mirroring that of Test.csv and the ‘Label’ column containing your predictions. The order of the rows does not matter, but the names of the ‘ID’ must be correct.Values in the 'label' column should be -1, 0 or 1.