🎙️ Trending Now: Simple model 0.654838709677419...

AI4D Malawi News Classification Challenge

Helping Malawi

$2 000 USD

Completed (~5 years ago)

Skills you will learn

Classification

833 joined

322 active

Info Data Chat Leaderboard

Start

Jan 22, 21

May 09, 21

Reveal

May 09, 21

Costia

Simple model 0.654838709677419 private score Github

Connect · 13 May 2021, 11:24 · 1

Congratulations to winners and participants! It was an interesting competition with highly imbalanced classes. Not helped for me: statistical features, model stacking - my stacked models worked better on public LB and worse on private LB. But the solo model showed the opposite. So this is a simple Logistic regression model fitted on Tfidf character level and word level features. Because of a very small number of texts the rarest occurrences of n grams has been taken. Tfidf features were united with fasttext 60-length averaged word vectors for texts. Cross Validation has shown a rather good score, but it was unstable, so its public LB results were poor.

Github link:

https://github.com/CostiaB/AI4D-Malawi-News-Classification-Challenge

Discussion 1 answer

flamethrower

Yes this is same observation with me. Private LB on my end is 0.664 with just tfidf and simple model, very good CV 0.649 and good public score but surprising observation is ensemble 0.6384 Private LB, CV score 0.6558, same Public LB score as single model.

I guess with a small dataset, a little bit of unpredictable occurrences are possible.

Nice work all the same.

13 May 2021, 11:42

Upvotes 0

Join the largest network for
data scientists and AI builders

About FAQs

Status