Each row in the dataset represents a single sample collected from one of four body sites, "Nasal", "Mouth", "Stool", "Skin".
The MPEG-G files contain compressed fastq files of 16S rRNA sequence profile features.
The data is split into 2 901 train and 1 068 test samples.
You are provided with data in this format. You will need to create 4x folders based on the four body sites for you federated learning solution.
The data is a stratified split by participant and insulin sensitivity.
Please note that not all samples have cytokine information provided.