Primary competition visual

The AI Telco Troubleshooting Challenge

€35 000 EUR
Completed (~1 month ago)
Root Cause Analysis
Fault Detection
Edge AI
Anomaly Detection
Large Language Models
1254 joined
253 active
Starti
Nov 28, 25
Closei
Feb 01, 26
Reveali
Feb 02, 26
Data Augmentation restrictions
Help · 29 Jan 2026, 07:48 · 3

Hi, It is sighlty unclear from the previous discussions that is it allowed to use test samples to generate more data for training or not, it will be heplful if the host can clarify on this.

☎️ Join the Buzz: Data augmentation for phase 2 - 312 Views

It is clear that we should be only using the provided list of 3 qwen models to generate synthetic dataset. So the provided examples to generate the synthetic data should only be from train samples or they can be from test samples too?

Discussion 3 answers

You should not use test samples to generate training data. The core principle is that the test set is meant to evaluate your model's ability to generalize to unseen data. Using test data during training or data augmentation violates this principle and can lead to overfitting.

While the specific rules for the "Buzz: Data augmentation for phase 2" challenge should always be the final authority, Zindi's standard rules emphasize:

  • Submissions must run on the original, provided datasets.
  • Using data in ways that create leaks (like using the test set for training) is against the spirit of fair competition and can lead to disqualification, as it produces solutions that are not valuable to the client.
29 Jan 2026, 09:26
Upvotes 0

Agreed to your points, that is why I am requesting for this to be clearly stated by the Host, becuase looking at some old discussions some users may have misunderstood that using test to generate similar samples is allowed.

Agreed, just wanted know if publicly available datasets can be used or the dataset should strictly be the train dataset provided for the competition?