Primary competition visual

Gender-Based Violence Tweet Classification by #ZindiWeekendz

Helping Global
$300 USD
Completed (over 4 years ago)
Natural Language Processing
Classification
166 joined
110 active
Starti
Aug 06, 21
Closei
Aug 08, 21
Reveali
Aug 08, 21
User avatar
Josplay enterprise
A Hint for you guys
Data · 7 Aug 2021, 05:48 · edited 20 minutes later · 6

I have tried all the popular machine learning algorithms such as XGB, CAT, LGB, SVM, LOGIT, NB. They all have a performance of more than 0.99% on 5 k-fold cross validation on the training set and have a performance of more than 0.70% on the test set. And the incredible thing is, each fold in the training set has a performance of 0.99%. Remember, I haven't done any hyperparameter tuning on any of those algorithms, Just the right Feature Engineering is sufficient. But also remember that the data is imbalanced that's why we are getting that level of accuracy.

Discussion 6 answers
User avatar
Federal university of technology minna

what were there coresponding scores on the leaderboard?

7 Aug 2021, 06:22
Upvotes 0
User avatar
University of lagos

it said 0.70 on his post

User avatar
Kamenialexnea
Ecole nationale superieure polytechnique yaounde

Maybe because the test set doesn't follow train distribution of classes. Not sure

7 Aug 2021, 09:30
Upvotes 0
User avatar
Josplay enterprise

If you look at the rules of this hackathon, it says 20% public leaderboard and 80% private. Therefore, you never know whether we guys at the top are overfitting the model.

User avatar
Kamenialexnea
Ecole nationale superieure polytechnique yaounde

I'm talking about the distribution of classes in testset and trainset not percentage of public and private leaderboard. It's different

And even there, if we randomly seperate a dataset, we can lose the class distribution.

User avatar
Josplay enterprise

That is what am saying, if the distribution is different and you manage to overfit your way to get higher score on the 20%, you might end up at the bottom of the leaderboard after the Competition ended.