Primary competition visual

bloods.ai Blood Spectroscopy Classification Challenge

Helping Global
$7 500 USD
Completed (~4 years ago)
Classification
1103 joined
265 active
Starti
Oct 19, 21
Closei
Feb 13, 22
Reveali
Feb 13, 22
CV out of fold error discussion
Connect · 19 Dec 2021, 17:12 · 14

Hi,

I think it would be worthwhile to discuss our cross-validation out of fold errors (log-loss/cross-entropy). I don't think the leaderboard is entirely representative of the efficacy of competitors' models as it is easy to probe the test set through multiple submissions - perhaps the reason behind the many accounts at 0.891891891891892 accuracy?

In addition to this, I think accuracy is a poor metric to evaluate our models' predictions as a model that learns no relationship except to predict the mode ('ok'), will score 70% accuracy. I think metrics like AUROC or F1 would be better to evaluate the performance of a model. A question for the organisers - are type 1 or type 2 errors equally as bad & should the model have a high sensitivity or specificity?

I am running 5-fold stratified cross-validation with three separate models (one for HDL, one for LDL, one for HGB) using "Update_train.csv" and no additional rows. My out of fold log loss errors are:

HDL CHOLESTEROL 1.0029778860962180 LDL CHOLESTEROL 0.8348846073765116 HGB HEMOGLOBIN 0.5316733033387239

I have not submitted my test predictions using these three models yet. Please share your validation scores if you are comfortable doing so!

P.S. If anyone would like to discuss the competition and consider teaming up, please contact me

Discussion 14 answers
User avatar
tomy4reel
Nexford University

how do u get 1.00 on hdl

19 Dec 2021, 18:22
Upvotes 0

What logloss do your validation predictions get? I improved it through feature engineering and hyperparameter tuning but HDL is very difficult

User avatar
tomy4reel
Nexford University

btw 0.55-0.59 accuracy score on both hdl & ldl

That's interesting, your accuracies are better than mine (0.53 HDL) but I presume your loss is higher based on your other comment? Are you using 5-fold CV? I also did not adjust seed when splitting data to maximise accuracy.

User avatar
tomy4reel
Nexford University

yes, 10-fold

With some further feature engineering and selection, I've been able to reduce the errors with the same 5-Fold CV split to:

LDL CHOLESTEROL

LOGLOSS: 0.7923376835779603

AUROC: 0.6153790946473873

ACCURACY: 0.602880658436214

HGB HEMOGLOBIN

LOGLOSS: 0.48158596143734356

AUROC: 0.6856893643125527

ACCURACY: 0.8518518518518519

HDL CHOLESTEROL

LOGLOSS: 0.9424358134874014

AUROC: 0.6837253777012813

ACCURACY: 0.5617283950617284

Do you think that it's actually possible to achieve 90%+ accuracy across all three sets of classes let alone just achieving it for the set of HGB labels? I think the public leaderboard is extremely misleading but I'd happily be proven wrong! My next step is further feature engineering, hyperparameter tuning, and perhaps some blending with other models.

Final note - I think the AUROC score for the HGB Hemoglobin classification task demonstrates how ineffective accuracy is. All of my predictions are 'ok' but I get a high accuracy (85%) just due to the class imbalance, I think it was a bad idea to use accuracy as an evaluation metric when there is such a strong class imbalance...

23 Dec 2021, 02:18
Upvotes 0
User avatar
tomy4reel
Nexford University

i agree with you,

yes, i think its possible with more data for low/high classes & may-be other meta-data, donor age, sex, etc

Thanks for your response but I think that having more low/high classes could make it even more difficult to achieve an accuracy of 90%. The more evenly distributed the frequency of the classes (e.g. 33/33/34), the better the model needs to be to classify at an accuracy of 90% compared to if the classes were distributed (90/5/5).

I do agree that having more data in the underrepresented classes would help us develop stronger models as currently my models are learning to classify everything as 'ok' as it results in a low loss. There are a few methods I'm considering to tackle this issue that involve modifying the loss function or using subsets of the data...

Any additional metadata would be very welcome as it would make our models' performance equal or greater than they currently are

User avatar
flamethrower

Hello,

You really should consider using GroupKFold in your evaluation, otherwise your model will encode information from one sample of same user to predict one of the other 60 samples of the same user in validation set, which isn't the case as test time. Biased evaluation.

23 Dec 2021, 13:17
Upvotes 0

Hi, I am already using StratifiedGroupKFold but thanks for the advice. If I wasn't using a form of GroupKFold, I think my errors would be significantly smaller for the reason you described.

What CV errors do you get with 5-fold CV?

User avatar
flamethrower

Yes otherwise evaluation would be high accuracies. I haven't done any modelling. Still learning about the domain problem.

Good luck when you begin modelling. Learning about the domain sounds like a good idea, there are many interesting papers tackling similar problems with methods that can be transferred to this problem.

User avatar
flamethrower

Thank you, good luck too. Yes I will check that out too. This challenge is really an intriguing one.

User avatar
pmwaniki
Kemri wellcome trust research programme

Thanks for the info. I have just learned that GroupKFold is implemented in scikit-learn.