💰 This Week on Zindi: A Case for Evaluating All Subm...

data.org Financial Health Prediction Challenge

Helping Eswatini, Lesotho
and 2 other countries

Eswatini
Lesotho
Zimbabwe
Malawi
Scroll to see more

$1 500 USD

Completed (3 months ago)

Skills you will learn

Prediction

Machine Learning

1774 joined

894 active

Info Data Chat Leaderboard

Start

Dec 12, 25

Mar 15, 26

Reveal

Mar 16, 26

okonp07

A Case for Evaluating All Submissions Against Private Data

19 Mar 2026, 14:35 · 5

Since the final outcome depends on unseen private data, limiting evaluation to only two submissions per participant does not seem like the fairest approach. The public leaderboard clearly was not a perfect reflection of private leaderboard performance, as many of the eventual top performers had public F1 scores below 0.9. This means that participants with stronger public scores may also have had alternative submissions that would have performed better on the private data, but those submissions were excluded simply because only 2 highest public-scoring entry was chosen per participant.

As a result, some competitors may have been unfairly disadvantaged, not because they lacked better models, but because their potentially stronger private-board submissions were never evaluated. A more transparent and equitable approach would have been to run all submissions against the private dataset, just as was done for the public evaluation, choosing the best submission made by the participant as evaluated against the private data. That would have produced a more accurate final ranking and a leaderboard that better reflected true model performance.

Discussion 5 answers

Moujoudix

your suggestion sounds logic, however, mostly all competition platform uses this same model : you choose 2 submissions that you want to lock for final evaluation. I dont know why is that, but i'd guess for computing ressources mainly, especially on kaggle where you have thousands of competitors each time. So for example for this competitiion, there was a total of 801 competitor, which would make at most 1602 submission to test. If we consider your note, that number would jumb quickly to 10k or even 100k, which may not be convenient.

But still it is a good idea to consider for low computation tasks like this tabular one, and avoid it with more heavy workflows ( Computer vision, NLP..etc)

N.B. : I scored 141 on the final LB, my best submission however would have scored 10th 😅

19 Mar 2026, 14:45

Upvotes 0

abdelrhman012018

Alexandria university

N.B. : I scored 225 on the final LB, my best submission however would have scored 1th 😪😪😪🙂🙂🙃🙃

replied to Moujoudix19 Mar 2026, 19:28

Upvotes 0

Moujoudix

Hard luck :/ , choose wisely next time.

replied to abdelrhman01201819 Mar 2026, 20:15

Upvotes 1

obvioussnort

Your idea makes sense, yet the majority of competition platforms follow the same format: you select two submissions to lock for final review. snowrider

3 Apr 2026, 07:38

Upvotes 0

berbis29

I hadn't considered this angle before, but it makes sense. If the goal is to identify the strongest model overall, evaluating a broader set of submissions against the private data could potentially lead to different outcomes.

2 Jun 2026, 07:23

Upvotes 0

Join the largest network for
data scientists and AI builders

About FAQs

Status