📚 This Week on Zindi: Public Score

Lacuna Masakhane Parts of Speech Classification Challenge

Helping Africa

$7 000 USD

Completed (almost 3 years ago)

Skills you will learn

Classification

Natural Language Processing

470 joined

100 active

Info Data Chat Leaderboard

Start

Jun 08, 23

Sep 17, 23

Reveal

Sep 17, 23

jpandeinge

University of manchester

Public Score

Help · 22 Aug 2023, 12:28 · 10

My public score isn't improving, although my local score is improving, I checked that I am not overfitting my model, and I find this a bit weird because I can't figure out how my score isn't improving on the public score, although my accuracy is around 0.68 and can't go beyond 0.43. Any idea why this might be the case?

Discussion 10 answers

HungryLearner

There are lots of factors that contribute to the correlation between local and public score.

For this particular challenge, you are not given training dataset for the expected testing languages. So, we can't be sure of how to relate the local score with the public ones.

The local score depends on what you're validating your model with. Be it the dev part of the language(s) you used in training. It may also be another set of languages selected for validation while training with some selected languages not included in your validation language set.

These approaches may have their merit and demerit for this kind of out-of-domain challenge but may not lead to a correlated public score as the language used for validation is not exactly the test languages.

It boils down to recalibrating your experimental setup in logically deciding on how you perform your local validation. You may be lucky enough to find a setup that can be correlated to expected final score (private).

Also note, public scores do not really imply a good performing model as this is computed on a segment of the given test set. Many a time, shakeup do occur when the private score get refilled after the challenge deadline, leading to a lot of repositioning of the participants on the leaderboard.

Happy coding !!!

22 Aug 2023, 13:08

Upvotes 3

jpandeinge

University of manchester

that's what i am thinking, but I was just worried since the final evaluation might be based on the score that's reflected on the public one, although I have models with higher accuracies that didn't just surpass the public scores.

thanks for the clarity, cheers!

replied to HungryLearner22 Aug 2023, 13:27

Upvotes 0

JEANMPIA

Hello, You need to set up proper CV.

Think about what you will be evaluated on, and see if your validation score reflects that. You need to trust CV, but only if it is setup properly, and this much of a difference is no where near a good sign. I don't know if you made your own baseline or used any of the ones anyone has shared so far, but if the later, I strongly advise you to rethink about it, as the techniques shared so far will not get you the shakeup intended.

22 Aug 2023, 14:28

Upvotes 2

Reacher

Comment deleted since it makes no sense, sorry!

28 Aug 2023, 11:22

Upvotes 0

JEANMPIA

There are dups, but 80% ?

I don't know if that was an exageration to make a more striking point or if thats what you actually recorded from the comp data, but thats not what my teamate and I found. The techniques discussed in the paper are more focused towards same language in train and valid as you said but I would like to see how you can get 92% on this comp from their approach 🤔.

I also wonder how you realised that labels are noisy, I don't personnaly speak any of the labelled languages so I wasn't able to assess that but maybe you do. What I found tho from the labeled french data is that the sentences don't make much sense, idk if thats the case in other langs...

PS: Congrats on the contrails comp top 20 :)

replied to Reacher28 Aug 2023, 13:52

Upvotes 2

Reacher

Nonesense again!

replied to JEANMPIA28 Aug 2023, 14:06

Upvotes 0

JEANMPIA

Hey, yes indeed you are missing something but thats all I can say without my teamate getting angry at me sharing too much :)

As I said before, you shouldn't expect a shakeup to 0.70ish with the current LB you have.

Obviously you don't have to trust my word for it, but I'm sure my LB neighbours would tend to align with what I'm telling you.

replied to Reacher28 Aug 2023, 14:42