Hello everyone,
We have now finalised the evaluation process for the challenge. Thank you all for your hard work and participation throughout this journey.
In line with the official evaluation rules, we assessed solutions based on predefined technical criteria, including reproducibility and generalisation.
To be eligible for final evaluation, teams were required to submit their model, code, instructions, and report by the stated deadline. The presence of these materials was necessary to complete the evaluation process - certain submissions were not evaluated on this basis.
Congratulations to the winners, and thank you to everyone who took part. We look forward to seeing you in the next challenge.
The final evaluation results are as follows:
Track 1 Qwen3-32B:
Winner: @Greenpark
Runner-up: Team Netis
Track 2 Qwen2.5-7B-Instruct:
Winner: Team islab_snu
Runner-up: Team TeleLLM-team
Track 3 Qwen2.5-1.5B-Instruct:
Winner: Team gopher
Runner-up: Team TD
Can you show the scores for each team in the final phase?
May I ask what the evaluation criteria for the final ranking would be?
where are the final phase score is it the private or as antonio said based on new data !!
the 1.5b team has no record on 1.5b track but get golden score
Yes,if they submitted using other submissions, it means their score in the 1.5B track is below 80.41. Besides, what is even more unbelievable is that they are not in the top ten of any track, so theoretically, they shouldn't even enter the review process?
Congratulations to the winners! I would love to see their solutions, and approaches. This was a very interesting competition overall.
Please open source the winning solutions so that we can learn from them.
Dear all,
I understand your disappointment following the final evaluation. Reaching the final phase required a significant amount of work, and we sincerely appreciate the effort and dedication you have all shown. Our goal throughout the process has been to remain fully transparent in how the evaluation was conducted.
Given the limited time available, we did our best during the evaluation. Below, I provide additional details regarding the evaluation scores of the winner and the runner-up. That said, if you submitted a model to any of the tracks and would like more information about your own final score, please feel free to contact me at any time.
First, I would like to recall that we fully evaluated only those participants who provided a complete submission, namely: model, code, and report. Unfortunately, several top-ranked participants did not meet all of these requirements.
Second, we attempted to reproduce the scores achieved on the public leaderboard on Zindi. For some submissions, we were unable to obtain the same results.
Finally, as suggested by many participants, we evaluated the models on a private dataset that was not used on Zindi. The final evaluation was based on both the reproduced public leaderboard score and the performance on this private dataset.
Thank you again for your hard work and engagement throughout the challenge.
Track 1 Qwen3-32B: Winner: Greenpark (https://github.com/greenpark12345/Qwen3-32BAI-Telco) Reproduced score: 0.908; private dataset: 47.6%
Runner-up: Netis (https://gist.github.com/vaderyang/06550f3c7d7929e81d5999763d1764c7) Reproduced score: 0.891; private dataset: 31.2%
Track 2 Qwen2.5-7B-Instruct: Winner: islab_snu(https://huggingface.co/Seokhyun1/islab_snu_7B) Reproduced score: 0.905; private dataset: 53.6%
Runner-up: Team Tele-LLM (https://www.modelscope.cn/models/telellm2026/qwen2.5_7b_rlsft_iter0666_v4.05) Reproduced score: 0.924; private dataset: 43.8%
Track 3 Qwen2.5-1.5B-Instruct: Winner: gopher (https://huggingface.co/abrar008/gopher_submission) Reproduced score: 0.792; private dataset: 20.5%
Runner-up: TD (https://huggingface.co/franklin0203/1.5-4bit-lora-adapter-checkpoint-34000) Reproduced score: 0.896; private dataset: 12%
Finally, I want to clarify that Gopher submission was indeed for Track 3. Their model is a Qwen2.5-1.5B-Instruct. There was just a mistake on the last submission on Zindi. We did not receive or evaluate other models from their side.
Congratulations to all the winners. I am trying to understand how did team gopher win and all they did was use xgboost to get the correct diagnosis and then use qwen2.5-1.5B-Instruct just for the explanation? Of which the explanation was not actually needed . So in essence, even if they used xgboost alone they would have still won? Wasn;t the whole idea to use the qwen models for the diagnosis?
Am I missing something?
Why do i get a 404 on all the links that you provided Antonio?
remove last bracket in the links
Koleshjr, indeed we expected participants to use qwen models for the full diagnosis in all the tracks. However, for all the other submissions for this track following different approaches either we were not able to reproduce the same (high scores) or they were not able to generalize on other test sets (see the runner-up score). We knew in advance that the small model generalization capabilities are limited but we were hoping for the participants to achieve some breakthrough in this sense. The model that has won proposes a relatively simple solution that achieves a fair score across multiple tested datasets.
Hi Antonio may I know what is our score we compete only for qwen 2.5 7b model
We have sent an email to inquire about the final score, but have not yet received a response. Could you kindly advise on the current status?