@AJoel and fellow scientists,
I’d like to clarify a point from the data section, which states:
“Any use of manually selected threshold (e.g., setting a fixed cut-off on probabilities) is strictly forbidden.”
My question is: If I use a standard classification model (e.g., LightGBM, XGBoost, Logistic Regression), these models naturally output probabilities, and the default decision rule is simply:
pred = (pred_proba >=0.5).astype(int)
So if I predict probabilities and then set the default 0.5 threshold is this also considered manually selected, and therefore forbidden?
Or is the intention only to forbid intentionally tuning thresholds (e.g., 0.42, 0.67) as part of feature engineering or validation?
Hi Juliuss, I believe the thresholding example applies to the binary classification aspect of the challenge.
If that is the case, I don't think that rule really restricts us because auc calculation requires raw probability scores as opposed to label predictions that call for the kind of thresholding you referenced.
Insightful. Thanks @isaacOluwafemiOg
@isaacOluwafemiOg Yeah, that's very true "it only requires raw probability scores as opposed to label predictions".