🎙️ Join the Buzz: Learnings and shareing

AI4D Malawi News Classification Challenge

Helping Malawi

$2 000 USD

Completed (~5 years ago)

Skills you will learn

Classification

833 joined

322 active

Info Data Chat Leaderboard

Start

Jan 22, 21

May 09, 21

Reveal

May 09, 21

anishjain

Learnings and shareing

Help · 10 May 2021, 12:29 · 5

What I learn from this competition- my first model is a basic model where I used POLITICS as a target for every row of the test dataset and that gives me accuracy around 19.01 ish and so far this is my best one

2. I try to build to simple Machine learning model Using countVectorizer and Tf-Idf embeddings and train a bunch of model Decision tree, Radome forest, Xgboost, these models give a score like 12 13 ish something but with my validation dataset they give 30 40 45 ish so I am not clear why there are different results with validation and test dataset hope so anyone helps me on this

3.Naive Bayes which gives my best score of 19.34 ish I also try to train LSTM but failed to do so I use data cleaning like removing multiple spaces, punctuations, and removing stop words but nothing improve my accuracy, not a single percentage I want to know how you guys approach this I know only top 3 get the prize but at least we get learning cause sharing is learning

Thanks for any help

Anish Jain

@itsanishjain - Twitter

Discussion 5 answers

aninda_bitm

I don't know if it is worth sharing model from 137 rank in the LB. But here is my approach

1. Train fasttext embedding on data which is a concatenated form of train & test data

2. Train and validate on fasttext embedding using Catboost Classifier and predict on test data using 5 folds

3. Take average of prediction probabilities of the 5 folds and submit

10 May 2021, 12:35

Upvotes 0

anishjain

You know what it's worth I got 19 ish still I am sharing so yours is definitely worth sharing Thanks I also try to train fasttest but failed.So if it is possible for you to share your code we can learn

replied to aninda_bitm10 May 2021, 13:27

Upvotes 0

aninda_bitm

https://www.kaggle.com/aninda/word2vec-malawi?scriptVersionId=59483860

replied to anishjain10 May 2021, 13:30

Upvotes 0

anishjain

Thanks

replied to aninda_bitm10 May 2021, 16:10

Upvotes 0

Linafe313

My score was 0.61 using RandomForestClassifier. The classes were imbalanced. You can use the SMOTE method to rebalance the classes in the pipeline. https://github.com/Linafe313/Mini-projects/blob/main/Zindi_Chichewa_News_Classification_Challenge.ipynb

12 May 2021, 14:26

Upvotes 0

Join the largest network for
data scientists and AI builders

About FAQs

Status