@AJoel and fellow scientists,
I’d like to clarify a point from the data section, which states:
“Any use of manually selected threshold (e.g., setting a fixed cut-off on probabilities) is strictly forbidden.”
My question is: If I use a standard classification model (e.g., LightGBM, XGBoost, Logistic Regression), these models naturally output probabilities, and the default decision rule is simply:
pred = (pred_proba >=0.5).astype(int)
So if I predict probabilities and then set the default 0.5 threshold is this also considered manually selected, and therefore forbidden?
Or is the intention only to forbid intentionally tuning thresholds (e.g., 0.42, 0.67) as part of feature engineering or validation?
Hi Juliuss, I believe the thresholding example applies to the binary classification aspect of the challenge.
If that is the case, I don't think that rule really restricts us because auc calculation requires raw probability scores as opposed to label predictions that call for the kind of thresholding you referenced.
Insightful. Thanks @isaacOluwafemiOg
@isaacOluwafemiOg Yeah, that's very true "it only requires raw probability scores as opposed to label predictions".
so rounding, clipping etc won't be accepted righ @MICADEE? @isaacOluwafemiOg?
I wish I could give a definitive answer. I would rather rely on a response from @AJoel
Sincerely, that's exactly what i did, I mean "raw probability scores" and I am also looking for clarification from @AJoel on this issue as well.
Hello @MICADEE, the outputs are raw probabilities. So the thresholding comments can be ignored since you are not returning predicted labels.
well currently my model would output negative quantities..lol...so i clip to 0. That's bad @AJoel? Quatities can go upto 2dp, 3dp, so i round to nearest one decimal place, that's bad too?
Obviously negative values don't make sense. So clipping is fine. I won't worry about values been rounded to the nearest one decimal place.
Thanks so much, this helps alot @AJoel🙏