Primary competition visual

AI4D Malawi News Classification Challenge

Helping Malawi
$2 000 USD
Completed (almost 5 years ago)
Classification
830 joined
322 active
Starti
Jan 22, 21
Closei
May 09, 21
Reveali
May 09, 21
Learnings and shareing
Help · 10 May 2021, 12:29 · 5

What I learn from this competition- my first model is a basic model where I used POLITICS as a target for every row of the test dataset and that gives me accuracy around 19.01 ish and so far this is my best one

2. I try to build to simple Machine learning model Using countVectorizer and Tf-Idf embeddings and train a bunch of model Decision tree, Radome forest, Xgboost, these models give a score like 12 13 ish something but with my validation dataset they give 30 40 45 ish so I am not clear why there are different results with validation and test dataset hope so anyone helps me on this

3.Naive Bayes which gives my best score of 19.34 ish I also try to train LSTM but failed to do so I use data cleaning like removing multiple spaces, punctuations, and removing stop words but nothing improve my accuracy, not a single percentage I want to know how you guys approach this I know only top 3 get the prize but at least we get learning cause sharing is learning

Thanks for any help

Anish Jain

@itsanishjain - Twitter

Discussion 5 answers

I don't know if it is worth sharing model from 137 rank in the LB. But here is my approach

1. Train fasttext embedding on data which is a concatenated form of train & test data

2. Train and validate on fasttext embedding using Catboost Classifier and predict on test data using 5 folds

3. Take average of prediction probabilities of the 5 folds and submit

10 May 2021, 12:35
Upvotes 0

You know what it's worth I got 19 ish still I am sharing so yours is definitely worth sharing Thanks I also try to train fasttest but failed.So if it is possible for you to share your code we can learn

https://www.kaggle.com/aninda/word2vec-malawi?scriptVersionId=59483860

My score was 0.61 using RandomForestClassifier. The classes were imbalanced. You can use the SMOTE method to rebalance the classes in the pipeline. https://github.com/Linafe313/Mini-projects/blob/main/Zindi_Chichewa_News_Classification_Challenge.ipynb

12 May 2021, 14:26
Upvotes 0