☎️ Let's Talk About: Clarification on Reasoning Eva...

Clarification on Reasoning Evaluation and Model Usage per Track

Help · 23 Jan 2026, 14:04 · 4

I have two clarification questions regarding the evaluation and track rules:

1. If a model produces the correct final answer but the accompanying reasoning is partially or completely incorrect, is the evaluation based solely on the final answer(right now it's), or is there any assessment of reasoning quality during judging or post-evaluation?

2. For a given track (e.g., Qwen-1.5B), is it acceptable to use a larger model e.g Qwen3-32B in intermediate steps such as preprocessing, while using the track’s model only for the answer generation? Or must all stages exclusively use the track-specified model?

Discussion 4 answers

AntonioDeDomenico

Hi, 1) good point but I do not think we will have time to run this (LLM-as-a-judge) eval. 2) yes, you can do it

23 Jan 2026, 15:53

Upvotes 1

ahuvam

Hi @AntonioDeDomenico, Thanks for the clarification on (2)! Just to make sure I understand the intended scope of preprocessing: if larger models are allowed in preprocessing, are there any constraints on using them for semantic reasoning steps (e.g., decomposition, chain-of-thought generation, or intermediate solution drafting), as opposed to purely mechanical steps like retrieval, filtering, or formatting?

I’m asking because, depending on how preprocessing is defined, if using Larger models at inference time is allowed for preprocessing , the problem can in turn be divided into two stages where reasoning can be heavily offloaded to larger model like generating CoT and use the track model only for final summarization, which could blur the distinction between compute-constrained tracks.

Thankyou for your prompt replies and clarifications!!

replied to AntonioDeDomenico23 Jan 2026, 16:33

Upvotes 0

AntonioDeDomenico

Hi @ahuvam, @neuron_x indeed, this is too blurry, and I should have been more precise. Larger models should not be used during inference time. I think this is clear enough and do not let space to misunderstanding

replied to ahuvam23 Jan 2026, 16:40

Upvotes 4

ahuvam

yes, thanks a lot!!

replied to AntonioDeDomenico23 Jan 2026, 16:46

Upvotes 0

Join the largest network for
data scientists and AI builders

About FAQs

Status