Between 2012 and 2019, the internet penetration rate in Arab countries increased from 30.3% to 51.6%. With the increasing internet access to Arab populations, their contributions to internet content are growing in a remarkable way.
A Python starter notebook is provided. This notebook will help you make your first submission on this hackathon.
Arabic dialect has been limited to oral conversations in everyday life, but is now appearing in written form for the first time thanks to modern content sharing platforms. Arabic dialent differs significantly from Modern Standard Arabic (MSA), which until now has been the only structured written Arabic language, and one that serves as the official language of writing and communication in all Arab countries. As a result, monitoring opinion and sentiment on multiple online platforms remains a difficult task because of the complexity of our dialectal languages when written down.
Automated sentiment analysis systems will be helpful in this regard. The goal of this challenge therefore is to alleviate this bottleneck in the context of fine-grained Arabic dialect sentiment analysis.
The data consists of phrases from an Arabic dialectal dataset. Each phrase has an associated label, either 0 (negative) or 1 (positive) as well as a unique ID. Your task is to predict if a given text in the test set is positive (1) or negative (0).
The source of this dataset is: Salima Medhaffar, Fethi Bougares, Yannick Estève and Lamia Hadrich-Belguith. Sentiment analysis of Tunisian dialects: Linguistic Resources and Experiments. WANLP 2017. EACL 2017.
Join the largest network for
data scientists and AI builders