Primary competition visual

Farm to Feed Shopping Basket Recommendation Challenge

Helping Kenya
€8 250 EUR
Completed (~1 month ago)
Machine Learning
Prediction
Feature Engineering
740 joined
266 active
Starti
Dec 02, 25
Closei
Jan 19, 26
Reveali
Jan 20, 26
User avatar
Juliuss
Freelance
Rules
Help · 10 Dec 2025, 00:35 · 10

@AJoel and fellow scientists,

I’d like to clarify a point from the data section, which states:

“Any use of manually selected threshold (e.g., setting a fixed cut-off on probabilities) is strictly forbidden.”

My question is: If I use a standard classification model (e.g., LightGBM, XGBoost, Logistic Regression), these models naturally output probabilities, and the default decision rule is simply:

pred = (pred_proba >=0.5).astype(int)

So if I predict probabilities and then set the default 0.5 threshold is this also considered manually selected, and therefore forbidden?

Or is the intention only to forbid intentionally tuning thresholds (e.g., 0.42, 0.67) as part of feature engineering or validation?

Discussion 10 answers
User avatar
isaacOluwafemiOg
Kwame nkrumah university of science and technology

Hi Juliuss, I believe the thresholding example applies to the binary classification aspect of the challenge.

If that is the case, I don't think that rule really restricts us because auc calculation requires raw probability scores as opposed to label predictions that call for the kind of thresholding you referenced.

10 Dec 2025, 01:08
Upvotes 2
User avatar
Juliuss
Freelance

Insightful. Thanks @isaacOluwafemiOg

User avatar
MICADEE
LAHASCOM

@isaacOluwafemiOg Yeah, that's very true "it only requires raw probability scores as opposed to label predictions".

User avatar
Juliuss
Freelance

so rounding, clipping etc won't be accepted righ @MICADEE? @isaacOluwafemiOg?

User avatar
isaacOluwafemiOg
Kwame nkrumah university of science and technology

I wish I could give a definitive answer. I would rather rely on a response from @AJoel

User avatar
MICADEE
LAHASCOM

Sincerely, that's exactly what i did, I mean "raw probability scores" and I am also looking for clarification from @AJoel on this issue as well.

User avatar
AJoel
Zindi

Hello @MICADEE, the outputs are raw probabilities. So the thresholding comments can be ignored since you are not returning predicted labels.

User avatar
Juliuss
Freelance

well currently my model would output negative quantities..lol...so i clip to 0. That's bad @AJoel? Quatities can go upto 2dp, 3dp, so i round to nearest one decimal place, that's bad too?

User avatar
AJoel
Zindi

Obviously negative values don't make sense. So clipping is fine. I won't worry about values been rounded to the nearest one decimal place.

User avatar
Juliuss
Freelance

Thanks so much, this helps alot @AJoel🙏