Primary competition visual

AI4D Malawi News Classification Challenge

Helping Malawi
$2 000 USD
Completed (almost 5 years ago)
Classification
830 joined
322 active
Starti
Jan 22, 21
Closei
May 09, 21
Reveali
May 09, 21
Simple model 0.654838709677419 private score Github
Connect · 13 May 2021, 11:24 · 1

Congratulations to winners and participants! It was an interesting competition with highly imbalanced classes. Not helped for me: statistical features, model stacking - my stacked models worked better on public LB and worse on private LB. But the solo model showed the opposite. So this is a simple Logistic regression model fitted on Tfidf character level and word level features. Because of a very small number of texts the rarest occurrences of n grams has been taken. Tfidf features were united with fasttext 60-length averaged word vectors for texts. Cross Validation has shown a rather good score, but it was unstable, so its public LB results were poor.

Github link:

https://github.com/CostiaB/AI4D-Malawi-News-Classification-Challenge

Discussion 1 answer
User avatar
flamethrower

Yes this is same observation with me. Private LB on my end is 0.664 with just tfidf and simple model, very good CV 0.649 and good public score but surprising observation is ensemble 0.6384 Private LB, CV score 0.6558, same Public LB score as single model.

I guess with a small dataset, a little bit of unpredictable occurrences are possible.

Nice work all the same.

13 May 2021, 11:42
Upvotes 0