Hi, It is sighlty unclear from the previous discussions that is it allowed to use test samples to generate more data for training or not, it will be heplful if the host can clarify on this.
☎️ Join the Buzz: Data augmentation for phase 2 - 312 Views
It is clear that we should be only using the provided list of 3 qwen models to generate synthetic dataset. So the provided examples to generate the synthetic data should only be from train samples or they can be from test samples too?
You should not use test samples to generate training data. The core principle is that the test set is meant to evaluate your model's ability to generalize to unseen data. Using test data during training or data augmentation violates this principle and can lead to overfitting.
While the specific rules for the "Buzz: Data augmentation for phase 2" challenge should always be the final authority, Zindi's standard rules emphasize:
Agreed to your points, that is why I am requesting for this to be clearly stated by the Host, becuase looking at some old discussions some users may have misunderstood that using test to generate similar samples is allowed.
Agreed, just wanted know if publicly available datasets can be used or the dataset should strictly be the train dataset provided for the competition?