COVID-19 Tweet Classification Challenge
Can you identify tweets about coronavirus without using keywords?
Prize
Knowledge
Time
Active
Participants
64 active · 401 enrolled
Helping
Africa
Classification
Media
About

The objective of this challenge is to develop a machine learning model to assess if a Twitter post is about covid-19 or not.The data used for this challenge has been collected by the Zindi team via Twitter API from tweets over the past year. The are ~7,000 tweets in the train set and ~3,000 in the test set.

Tweets have been classified as covid-19-related (1) or not covid-19-related (0). All tweets have had the following keywords removed:

  • corona
  • coronavirus
  • covid
  • covid19
  • covid-19
  • sarscov2
  • 19

The tweets have also had usernames and web addresses removed to ensure anonymity.

Leave your predictions as probabilities with values between 0 and 1 and do not round them to 0s or 1s.

How to use Colab on Zindi

How to mount a drive on Colab

Files
Description
Files
Train contains the target. This is the dataset that you will use to train your model.
This shows the submission format for this competition, with the ‘ID’ column mirroring that of Test.csv and the ‘target’ column containing your predictions. The order of the rows does not matter, but the names of the ID must be correct.
Test resembles Train.csv but without the target-related columns. This is the dataset on which you will apply your model to.