Since the final outcome depends on unseen private data, limiting evaluation to only two submissions per participant does not seem like the fairest approach. The public leaderboard clearly was not a perfect reflection of private leaderboard performance, as many of the eventual top performers had public F1 scores below 0.9. This means that participants with stronger public scores may also have had alternative submissions that would have performed better on the private data, but those submissions were excluded simply because only 2 highest public-scoring entry was chosen per participant.
As a result, some competitors may have been unfairly disadvantaged, not because they lacked better models, but because their potentially stronger private-board submissions were never evaluated. A more transparent and equitable approach would have been to run all submissions against the private dataset, just as was done for the public evaluation, choosing the best submission made by the participant as evaluated against the private data. That would have produced a more accurate final ranking and a leaderboard that better reflected true model performance.
your suggestion sounds logic, however, mostly all competition platform uses this same model : you choose 2 submissions that you want to lock for final evaluation. I dont know why is that, but i'd guess for computing ressources mainly, especially on kaggle where you have thousands of competitors each time. So for example for this competitiion, there was a total of 801 competitor, which would make at most 1602 submission to test. If we consider your note, that number would jumb quickly to 10k or even 100k, which may not be convenient.
But still it is a good idea to consider for low computation tasks like this tabular one, and avoid it with more heavy workflows ( Computer vision, NLP..etc)
N.B. : I scored 141 on the final LB, my best submission however would have scored 10th 😅
N.B. : I scored 225 on the final LB, my best submission however would have scored 1th 😪😪😪🙂🙂🙃🙃
Hard luck :/ , choose wisely next time.