Primary competition visual

Google NLP Hack Series: Swahili Social Media Sentiment Analysis Challenge

Helping East Africa
$1 000 USD
Challenge completed almost 4 years ago
Natural Language Processing
Classification
Sentiment Analysis
63 joined
40 active
Starti
Feb 18, 22
Closei
Feb 20, 22
Reveali
Feb 20, 22
About

Zindi Africa with EA ambassadors gathered Swahili contents (tweets) from Twitter that express sentiment about popular topics. For this purpose, we extracted 3 000 tweets using Tweepy and Twitter APIs.

The data was preprocessed by removing links, emoji symbols, and punctuations.

The collected tweets were manually annotated using an overall polarity: positive (1), negative (-1) and neutral (0).

Variable definitions

  • ID - This is the unique ID of a unique Swahili tweet.
  • Sentence - This is the content of a unique tweet.
  • Label- This is a sentiment of a particular tweet (positive (1), negative (-1) and neutral (0).

Files available for download:

  • Train.csv - contains the label (target). This is the dataset that you will use to train your model.
  • Test.csv- resembles Train.csv but without the target-related columns. This is the dataset on which you will apply your model to.
  • SampleSubmission.csv - shows the submission format for this competition, with the ‘ID’ column mirroring that of Test.csv and the ‘Label’ column containing your predictions. The order of the rows does not matter, but the names of the ‘ID’ must be correct.Values in the 'label' column should be -1, 0 or 1.
Files
Description
Files