Primary competition visual

data.org Financial Health Prediction Challenge

Helping Eswatini, Lesotho
and 2 other countries
  • Eswatini
  • Lesotho
  • Zimbabwe
  • Malawi
  • Scroll to see more
$1 500 USD
Under code review
Prediction
Machine Learning
1686 joined
898 active
Starti
Dec 12, 25
Closei
Mar 15, 26
Reveali
Mar 16, 26
User avatar
okonp07
A Case for Evaluating All Submissions Against Private Data
19 Mar 2026, 14:35 · 3

Since the final outcome depends on unseen private data, limiting evaluation to only two submissions per participant does not seem like the fairest approach. The public leaderboard clearly was not a perfect reflection of private leaderboard performance, as many of the eventual top performers had public F1 scores below 0.9. This means that participants with stronger public scores may also have had alternative submissions that would have performed better on the private data, but those submissions were excluded simply because only 2 highest public-scoring entry was chosen per participant.

As a result, some competitors may have been unfairly disadvantaged, not because they lacked better models, but because their potentially stronger private-board submissions were never evaluated. A more transparent and equitable approach would have been to run all submissions against the private dataset, just as was done for the public evaluation, choosing the best submission made by the participant as evaluated against the private data. That would have produced a more accurate final ranking and a leaderboard that better reflected true model performance.

Discussion 3 answers
User avatar
Moujoudix

your suggestion sounds logic, however, mostly all competition platform uses this same model : you choose 2 submissions that you want to lock for final evaluation. I dont know why is that, but i'd guess for computing ressources mainly, especially on kaggle where you have thousands of competitors each time. So for example for this competitiion, there was a total of 801 competitor, which would make at most 1602 submission to test. If we consider your note, that number would jumb quickly to 10k or even 100k, which may not be convenient.

But still it is a good idea to consider for low computation tasks like this tabular one, and avoid it with more heavy workflows ( Computer vision, NLP..etc)

N.B. : I scored 141 on the final LB, my best submission however would have scored 10th 😅

19 Mar 2026, 14:45
Upvotes 0
User avatar
abdelrhman012018
Alexandria university

N.B. : I scored 225 on the final LB, my best submission however would have scored 1th 😪😪😪🙂🙂🙃🙃

User avatar
Moujoudix

Hard luck :/ , choose wisely next time.