🌾 This Week on Zindi: Rules

Farm to Feed Shopping Basket Recommendation Challenge

Helping Kenya

€8 250 EUR

Completed (~1 month ago)

Skills you will learn

Machine Learning

Prediction

Feature Engineering

740 joined

266 active

Info Data Chat Leaderboard

Start

Dec 02, 25

Jan 19, 26

Reveal

Jan 20, 26

Juliuss

Freelance

Rules

Help · 10 Dec 2025, 00:35 · 10

@AJoel and fellow scientists,

I’d like to clarify a point from the data section, which states:

“Any use of manually selected threshold (e.g., setting a fixed cut-off on probabilities) is strictly forbidden.”

My question is: If I use a standard classification model (e.g., LightGBM, XGBoost, Logistic Regression), these models naturally output probabilities, and the default decision rule is simply:

pred = (pred_proba >=0.5).astype(int)

So if I predict probabilities and then set the default 0.5 threshold is this also considered manually selected, and therefore forbidden?

Or is the intention only to forbid intentionally tuning thresholds (e.g., 0.42, 0.67) as part of feature engineering or validation?

Discussion 10 answers

isaacOluwafemiOg

Kwame nkrumah university of science and technology

Hi Juliuss, I believe the thresholding example applies to the binary classification aspect of the challenge.

If that is the case, I don't think that rule really restricts us because auc calculation requires raw probability scores as opposed to label predictions that call for the kind of thresholding you referenced.

10 Dec 2025, 01:08

Upvotes 2

Juliuss

Freelance

Insightful. Thanks @isaacOluwafemiOg

replied to isaacOluwafemiOg10 Dec 2025, 08:38

Upvotes 3

MICADEE

LAHASCOM

@isaacOluwafemiOg Yeah, that's very true "it only requires raw probability scores as opposed to label predictions".

replied to isaacOluwafemiOg10 Dec 2025, 09:22

Upvotes 4

Juliuss

Freelance

so rounding, clipping etc won't be accepted righ @MICADEE? @isaacOluwafemiOg?

replied to MICADEE16 Dec 2025, 10:59

Upvotes 0

isaacOluwafemiOg

Kwame nkrumah university of science and technology

I wish I could give a definitive answer. I would rather rely on a response from @AJoel

replied to Juliuss16 Dec 2025, 13:30

Upvotes 1

MICADEE

LAHASCOM

Sincerely, that's exactly what i did, I mean "raw probability scores" and I am also looking for clarification from @AJoel on this issue as well.

replied to Juliuss16 Dec 2025, 21:59

Upvotes 0

AJoel

Zindi

Hello @MICADEE, the outputs are raw probabilities. So the thresholding comments can be ignored since you are not returning predicted labels.

replied to MICADEE17 Dec 2025, 07:48

Upvotes 1

Juliuss

Freelance

well currently my model would output negative quantities..lol...so i clip to 0. That's bad @AJoel? Quatities can go upto 2dp, 3dp, so i round to nearest one decimal place, that's bad too?

replied to AJoel17 Dec 2025, 10:02

Upvotes 0

AJoel

Zindi

Obviously negative values don't make sense. So clipping is fine. I won't worry about values been rounded to the nearest one decimal place.

replied to Juliuss17 Dec 2025, 12:19

Upvotes 0

Juliuss

Freelance

Thanks so much, this helps alot @AJoel🙏

replied to AJoel17 Dec 2025, 12:50

Upvotes 0

Join the largest network for
data scientists and AI builders

About FAQs

Status