Primary competition visual

Social Media Prediction Challenge

Helping Africa
$1 000 USD
Challenge completed almost 7 years ago
Natural Language Processing
Prediction
309 joined
28 active
Starti
Sep 04, 18
Closei
Nov 25, 18
Reveali
Nov 26, 18
About

The data has been split into a test and training set.

train.json (zipped) is the dataset that you will use to train your model. This dataset includes about 2,400 consecutive tweets from each of the companies listed below, for a total of 96,562 tweets.

test_questions.json (zipped) is the dataset to which you will apply your model to test how well it performs. Use your model and this dataset to predict the number of retweets a tweet will receive. The test set are the consecutive tweets that followed the first tweets provided in the training sets. There are a maximum of 800 tweets per company in this test set. This dataset includes the same fields as train.json except for the retweet_count and favorite_count variables.

sample_submission.csv is a table to provide an example of what your submission file should look like.

Notes on the data: This data was downloaded from Twitter on 23 August 2018. So represents the retweets and favorites at that point in time.

Variables in train.json and test_questions.json are as described in the twitter documentation:

Tweet Object - https://developer.twitter.com/en/docs/tweets/data-dictionary/overview/tweet-object

User Object - https://developer.twitter.com/en/docs/tweets/data-dictionary/overview/user-object

Entities Object- https://developer.twitter.com/en/docs/tweets/data-dictionary/overview/entities-object

GeoObject - https://developer.twitter.com/en/docs/tweets/data-dictionary/overview/geo-objects

Companies included in this dataset:

Nigeria

  • Zenith Bank
  • First Bank Nigeria
  • Guaranty Trust Bank
  • Access Bank
  • Diamond Bank
  • Ecobank
  • MTN
  • Airtel
  • GloMobile

Ghana

  • Barclays
  • Fidelity
  • Ecobank
  • Access
  • Ghana commercial bank
  • MTN
  • Vodafone
  • Airtel Tigo

South Africa

  • Standard Bank
  • ABSA-Barclays
  • FNB
  • Nedbank
  • Capitec
  • Vodacom
  • MTN
  • Cell C
  • Telkom

Kenya

  • Equity
  • Kenya Commercial Bank
  • Co-operative Bank
  • Standard Chartered
  • Safaricom
  • Airtel
  • Telkom

Uganda

  • Stanbic
  • DFCU
  • Standard Chartered
  • MTN
  • Airtel
Files
Description
Files
Train contains the target. This is the dataset that you will use to train your model.
Is an example of what your submission file should look like. The order of the rows does not matter, but the names of the "ID" must be correct.
Test resembles Train.csv but without the target-related columns. This is the dataset on which you will apply your model to.