Movie Review Sentiment Classification Challenge 🎥

Movie Review Sentiment Classification Challenge

Knowledge

Completed (almost 2 years ago)

Skills you will learn

Sentiment Analysis

23 joined

3 active

Info Data Leaderboard

Start

Feb 16, 23

Jun 06, 24

Reveal

Jun 06, 24

About

The core dataset contains 50,000 reviews split evenly into 25k train and 25k test sets. The overall distribution of labels is balanced (25k pos and 25k neg). We also include an additional 50,000 unlabeled documents for unsupervised learning.

In the entire collection, no more than 30 reviews are allowed for any given movie because reviews for the same movie tend to have correlated ratings. Further, the train and test sets contain a disjoint set of movies, so no significant performance is obtained by memorizing movie-unique terms and their associated with observed labels. In the labeled train/test sets, a negative review has a score <= 4 out of 10, and a positive review has a score >= 7 out of 10. Thus reviews with more neutral ratings are not included in the train/test sets. In the unsupervised set, reviews of any rating are included and there are an even number of reviews > 5 and <= 5.

Data citing

@InProceedings{maas-EtAl:2011:ACL-HLT2011,

author = {Maas, Andrew L. and Daly, Raymond E. and Pham, Peter T. and Huang, Dan and Ng, Andrew Y. and Potts, Christopher},

title = {Learning Word Vectors for Sentiment Analysis},

booktitle = {Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies},

month = {June},

year = {2011},

address = {Portland, Oregon, USA},

publisher = {Association for Computational Linguistics},

pages = {142--150},

url = {http://www.aclweb.org/anthology/P11-1015}}

Files

Description

Files

Train contains the target. This is the dataset that you will use to train your model.

Is an example of what your submission should look like. The order of the rows does not matter but the name of the ID must be correct.

Test resembles Train.csv but without the target-related columns. This is the dataset on which you will apply your model to.

Join the largest network for
data scientists and AI builders

About FAQs

Status