☎️ Trending Now: Question on Phase 2 verificati...

The AI Telco Troubleshooting Challenge by ITU

€35 000 EUR

Completed (5 months ago)

Skills you will learn

Root Cause Analysis

Fault Detection

Edge AI

Anomaly Detection

Large Language Models

1314 joined

251 active

Info Data Chat Leaderboard

Start

Nov 28, 25

Feb 01, 26

Reveal

Feb 02, 26

nas5962

Question on Phase 2 verification

Data · 30 Jan 2026, 02:20 · 13

I had a quick question to better understand how fairness is being ensured for all participants. For top submissions, do you re-run the training using the submitted code and declared data? I’m asking because if Phase 2 test data were used during training, this would be hard to detect just by running inference on already-trained models—and re-training models for verification also seems quite time-consuming and GPU-intensive.

If a top-ranked submission (for example, within the top 5) is found to have used Phase 2 data during training, does the review then move on to check the next submission further down the leaderboard? Now what happens if all of top 10 have used phase 2 test data during training?

Discussion 13 answers

huyue

of cause you can use it to check the % of correct answer to improve the type of question behaving badly. but like score below 0.6 is basicly not on a right track.

30 Jan 2026, 02:31

Upvotes 1

ahuvam

Good point. Using test data—either directly or by generating synthetic samples similar to it—will cause overfitting, and despite better benchmarking results, the model will be inferior due to poor generalization. So how this is tracked is definitely important.

30 Jan 2026, 03:41

Upvotes 2

Phaedrus

Where in rules does it say not to use test data to train a model?

30 Jan 2026, 04:17

Upvotes 0

Garvin

Yeah but then one can get a near 100% score if used the test data in a clever way, that is why using test in training is a grey area, which needs some clarification from the Host on exactly what use is allowed, if at all it is allowed.

replied to Phaedrus30 Jan 2026, 09:37

Upvotes 1

huyue

mark

replied to Garvin30 Jan 2026, 09:42

Upvotes 0

Phaedrus

i doubt near 100% point to be honest. But would be good if the host can clarify. Typical approaches like pseudo labeling is considered legitimate in other platforms like Kaggle.

replied to Garvin30 Jan 2026, 09:48

Upvotes 0

AntonioDeDomenico

Hi, in the review phase we plan to use unseen data. Obviusly, we cannot test all the models of the participants. We would not have the time to comply with our deadline

30 Jan 2026, 14:19

Upvotes 1

huyue

rerun with unseen data seems to be a fair solution to everyone. Also testing if the workpiece can be contributive in production

replied to AntonioDeDomenico30 Jan 2026, 15:24

Upvotes 0

Phaedrus

Is this new test set similar to train, test phase 1/2 or entirely new format? This has implications for many people's (mine atleast) pipelines.

replied to AntonioDeDomenico30 Jan 2026, 17:07

Upvotes 0

nas5962

Thanks for the clarification. Using unseen data helps. One small concern though: if this new data follows the same structure, tables, or distribution as Phase 2, a model that overfitted on Phase 2 may still perform well. To really flag this, it would help if the unseen data includes new table formats, different distributions, so that generalization—not memorization—is being tested. That would make the fairness check much stronger.

replied to AntonioDeDomenico30 Jan 2026, 17:33

Upvotes 2

newbee

This is very tricky. It depends on how 'new' and how 'different'. If we train a model and test it on test set, we should expect the test set has similar distribution as the training. Generalization is only for certain patterns, not for everything.

replied to nas596230 Jan 2026, 22:50

Upvotes 0

nas5962

The main goal of this competition has always been generalization, which is why the Phase 2 set follows a different data distribution than the training set. Based on that, I would expect the same differences observed on Phase 2 to also appear on truly unseen data that was not part of either the training set or Phase 2.

replied to newbee31 Jan 2026, 02:57

Upvotes 1

newbee

I agree it's about generalization. But there should be a boundary of the generalization. For an extreme example, if all the unseen data is general questions, it will be like testing the underlying Qwen model's generalization. Of course this wouldn't happen. However, the more we want difference, the closer we are to the extreme example. Too much difference may deviate the original scope, which is fine-tuning for detecting certain network failures.

replied to nas596231 Jan 2026, 03:31

Upvotes 2

Join the largest network for
data scientists and AI builders

About FAQs

Status