Primary competition visual

UmojaHack Morocco: AIOX Sentiment Analysis Challenge by UmojaHack Africa

Helping Morocco
37 000 MAD
Challenge completed ~5 years ago
Natural Language Processing
Classification
Sentiment Analysis
88 joined
40 active
Starti
Oct 24, 20
Closei
Oct 24, 20
Reveali
Oct 24, 20
About

Between 2012 and 2019, the internet penetration rate in Arab countries increased from 30.3% to 51.6%. With the increasing internet access to Arab populations, their contributions to internet content are growing in a remarkable way.

A Python starter notebook is provided. This notebook will help you make your first submission on this hackathon.

Arabic dialect has been limited to oral conversations in everyday life, but is now appearing in written form for the first time thanks to modern content sharing platforms. Arabic dialent differs significantly from Modern Standard Arabic (MSA), which until now has been the only structured written Arabic language, and one that serves as the official language of writing and communication in all Arab countries. As a result, monitoring opinion and sentiment on multiple online platforms remains a difficult task because of the complexity of our dialectal languages when written down.

Automated sentiment analysis systems will be helpful in this regard. The goal of this challenge therefore is to alleviate this bottleneck in the context of fine-grained Arabic dialect sentiment analysis.

The data consists of phrases from an Arabic dialectal dataset. Each phrase has an associated label, either 0 (negative) or 1 (positive) as well as a unique ID. Your task is to predict if a given text in the test set is positive (1) or negative (0).

  • Train.csv - Contains the ‘ID’, ‘text’ and ‘label’ columns for the training set.
  • Test.csv - Resembles Train.csv but without labels. (Note labels have been by default set to 0, but this does not mean anything.)
  • SampleSubmission.csv - shows the submission format for this competition, with the ID column mirroring that of Test.csv and the label column containing your predictions. The order of the rows does not matter, but the IDs must match and the column headings should not be changed.

The source of this dataset is: Salima Medhaffar, Fethi Bougares, Yannick Estève and Lamia Hadrich-Belguith. Sentiment analysis of Tunisian dialects: Linguistic Resources and Experiments. WANLP 2017. EACL 2017.

Files
Description
Files