Phase 2 data is now available, and as a result the leaderboard will be reset.
Good luck - we’re excited to see how your models perform on the new data!
Important reminder about submissions (please read!)
Please only submit answers for the track(s) you are actually competing in.
Do not copy the same answers into all three track columns unless you genuinely built models for all three.
Why this matters:
- Submissions require one file with columns for all three models (e.g. Qwen3-32B, Qwen2.5-7B-Instruct, Qwen2.5-1.5B-Instruct).
- If you’re not competing in a track, that column must stay exactly as the placeholder text in the sample submission.
- Submitting the same answers across all tracks (when they weren’t generated by those models) is not valid and will be disqualified - this keeps the leaderboard fair and reflects real model performance.
Best practice:
- Put real outputs only in the column(s) for your chosen track(s).
- Leave all other columns unchanged (placeholders only).
- Please don’t reuse the largest model’s output across tracks.
Thanks for helping keep things fair - and happy troubleshooting!
Hello @meganomaly , Could you kindly introduce a check to eliminate people who copy the same results for all tracks. Even if you tell people not to do it, they will still do it hence corrupting the LB as it is already. We could start by removing the current corrupted subs. Thanks
I mean the above is just after this announcement was made. I don't know if its intentional or not but people won't follow those rules. A check during submissions should be enforced, Otherwise the LB will remain corrupted. @Ajoel
Hi @Koleshjr Thanks for the feedback. We will be disqualifying any submissions with copied results in all tracks.
Very sorry, we just reused previous code submissions to get a preliminary and simple look at the accuracy on Phase2. Our future submissions will be normal and proper. We didn't mean to do it this way intentionally.
Thank you @meganomaly