⚠️ Hot Topic: A Hint for you guys

Gender-Based Violence Tweet Classification by #ZindiWeekendz

Helping Global

$300 USD

Completed (over 4 years ago)

Skills you will learn

Natural Language Processing

Classification

166 joined

110 active

Info Data Chat Leaderboard

Start

Aug 06, 21

Aug 08, 21

Reveal

Aug 08, 21

crimacode

Josplay enterprise

A Hint for you guys

Data · 7 Aug 2021, 05:48 · edited 20 minutes later · 6

I have tried all the popular machine learning algorithms such as XGB, CAT, LGB, SVM, LOGIT, NB. They all have a performance of more than 0.99% on 5 k-fold cross validation on the training set and have a performance of more than 0.70% on the test set. And the incredible thing is, each fold in the training set has a performance of 0.99%. Remember, I haven't done any hyperparameter tuning on any of those algorithms, Just the right Feature Engineering is sufficient. But also remember that the data is imbalanced that's why we are getting that level of accuracy.

Discussion 6 answers

Omotade

Federal university of technology minna

what were there coresponding scores on the leaderboard?

7 Aug 2021, 06:22

Upvotes 0

Ibrahimchristopher

University of lagos

it said 0.70 on his post

replied to Omotade7 Aug 2021, 06:41

Upvotes 0

Kamenialexnea

Ecole nationale superieure polytechnique yaounde

Maybe because the test set doesn't follow train distribution of classes. Not sure

7 Aug 2021, 09:30

Upvotes 0

crimacode

Josplay enterprise

If you look at the rules of this hackathon, it says 20% public leaderboard and 80% private. Therefore, you never know whether we guys at the top are overfitting the model.

replied to Kamenialexnea7 Aug 2021, 10:05

Upvotes 0

Kamenialexnea

Ecole nationale superieure polytechnique yaounde

I'm talking about the distribution of classes in testset and trainset not percentage of public and private leaderboard. It's different

And even there, if we randomly seperate a dataset, we can lose the class distribution.

replied to crimacode7 Aug 2021, 10:18

Upvotes 0

crimacode

Josplay enterprise

That is what am saying, if the distribution is different and you manage to overfit your way to get higher score on the 20%, you might end up at the bottom of the leaderboard after the Competition ended.

replied to Kamenialexnea7 Aug 2021, 10:31

Upvotes 0

Join the largest network for
data scientists and AI builders

About FAQs

Status