In the data description, you said : "Each blood sample is scanned 60 times, we actually have 60 measurements for each. We have shuffled the measurements, so you won’t know which measurements are from the same sample." Is it possible that we have measure of the same blood in train and test dataset ? That is to say after all measure, you joined and randomly split the all data. Or the sepration is made on blood ?
My second question is what is the need of this operation in real life please ? Is it possible that given a blood and n measures, we obtain different output ?
So a blood sample contains various chemical compounds. For example when you go to the hospital, the will take a blood sample from you and they can use it to determine the amount cholesterol in your blood. Based on the value the doctor can tell you if it is too high or too low to affect your health. Now if we have a device that can just scan you hand and give you a spectrum. If we have machine learning model that can predict the level of cholestorol from this spectrum, then people can do blood analysis without haveing to collect blood with needles. Also, you can even do your analysis from home and just send your results via emal to your doctor for intepretation.
Thank you @Ulrich
The objective of this is more clear now.
Hi Alex,
We splitted the data before rolling. So it is not possible to have measurements from the same blood samples in both the training and test data sets.