🐼 Join the Buzz: Cross validation scores vs lea...

Turtle Recall: Conservation Challenge

Helping Kenya

$10 000 USD

Completed (~4 years ago)

Skills you will learn

Classification

Computer Vision

756 joined

246 active

Info Data Chat Leaderboard

Start

Nov 19, 21

Apr 21, 22

Reveal

Apr 21, 22

rwambu

Cross validation scores vs leaderboard

Help · 28 Feb 2022, 18:30 · 4

Most of you have expressed concerns about the cross-validation scores being way better than the leaderboard scores; thank you for raising the concerns.

Most of the people I have talked to are using stratified folds, kfold, etc., to validate the scores locally, and there is a huge difference in the scores.

2. How many people are using the train/test split way(i.e., completely setting aside 30% of the train data) train the model and now use it to validate) to validate their scores? Are you still getting the same huge difference between your score and that on the leaderboard?

I have set aside 490 rows from the train data as part of my validation test, then trained my model(using the starter notebook) on 1656 rows of data - the score I get for the 490 rows is 0.054965986394557825, which is not very different from the leaderboard. This is just my observation. My request is could you faithfully do the same; please share the scores you get(no folds, please) :)

Let's work together to get the best solutions to conserve the turtles.

Discussion 4 answers

kiryusha

'' Are you still getting the same huge difference between your score and that on the leaderboard?''

The answer is yes. Please read carefully the discussion below where people checked it by labelling some part of the test set. The metric should be way higher. It is first. Second is super strange behaviour of Zindi beckend - I changed a huge part of my submission and LB score haven't changed! That means only one thing - public score is esteemated based on very few samples of data with probably not the same number in denominator. And finally. If you just look at the predictions on test set it'll turn out the predictions are very good (turtles are the same), model is getting the right answer in majority of samples.

28 Feb 2022, 19:40 (edited 6 minutes later)

Upvotes 0

Johnowhitaker

Setting aside ~430 rows, training on the rest, scoring on the rows that were held back during training: scores 0.54 locally, so drastically different to the leaderboard. Notebook: https://colab.research.google.com/drive/1Ijn9CpBaekJIZ6rYQ_Mw0jbq-PyrbXbq?usp=sharing

Manually reviewing some random predictions from a submission showed close to half that seemed right: https://colab.research.google.com/drive/1HJsQbj7pCvokgP9yvbGSKUnGZxiUwD9k?usp=sharing

I suggest picking a submission from someone high on the leaderboard and reviewing the predictions yourself using the notebook above - it should quickly become obvious whether they're right ~5% of the time or ~50% of the time. In which case, there is an issue with the scoring ;) Are you sure there isn't an errant .head(30) or something in the scoring code that's only scoring the first N rows and not the whole submission?

1 Mar 2022, 12:23

Upvotes 0

kiryusha

I got a responce from Zindi support saying they'll review the situation this week. Hope they can debug their backend:)

replied to Johnowhitaker1 Mar 2022, 12:35 (edited less than a minute later)

Upvotes 0

tahsin

Thanks, @kiryusha . My local validation score does not correlate with the LB score at all. I strongly believe the LB score calculation process is not correct.

replied to kiryusha3 Mar 2022, 04:10

Upvotes 0

Join the largest network for
data scientists and AI builders

About FAQs

Status