☎️ Let's Talk About: Data Augmentation restrictions...

The AI Telco Troubleshooting Challenge by ITU

€35 000 EUR

Completed (5 months ago)

Skills you will learn

Root Cause Analysis

Fault Detection

Edge AI

Anomaly Detection

Large Language Models

1314 joined

251 active

Info Data Chat Leaderboard

Start

Nov 28, 25

Feb 01, 26

Reveal

Feb 02, 26

Garvin

Data Augmentation restrictions

Help · 29 Jan 2026, 07:48 · 3

Hi, It is sighlty unclear from the previous discussions that is it allowed to use test samples to generate more data for training or not, it will be heplful if the host can clarify on this.

☎️ Join the Buzz: Data augmentation for phase 2 - 312 Views

It is clear that we should be only using the provided list of 3 qwen models to generate synthetic dataset. So the provided examples to generate the synthetic data should only be from train samples or they can be from test samples too?

Discussion 3 answers

DeDQ

You should not use test samples to generate training data. The core principle is that the test set is meant to evaluate your model's ability to generalize to unseen data. Using test data during training or data augmentation violates this principle and can lead to overfitting.

While the specific rules for the "Buzz: Data augmentation for phase 2" challenge should always be the final authority, Zindi's standard rules emphasize:

Submissions must run on the original, provided datasets.
Using data in ways that create leaks (like using the test set for training) is against the spirit of fair competition and can lead to disqualification, as it produces solutions that are not valuable to the client.

29 Jan 2026, 09:26

Upvotes 0

Garvin

Agreed to your points, that is why I am requesting for this to be clearly stated by the Host, becuase looking at some old discussions some users may have misunderstood that using test to generate similar samples is allowed.

replied to DeDQ29 Jan 2026, 10:08

Upvotes 1

ahuvam

Agreed, just wanted know if publicly available datasets can be used or the dataset should strictly be the train dataset provided for the competition?

replied to Garvin29 Jan 2026, 13:56

Upvotes 0

Join the largest network for
data scientists and AI builders

About FAQs

Status