🩺 Trending Now: did someone found a model that...

Helping Kenya

$10 000 USD

Completed (~1 year ago)

Skills you will learn

Prediction

Natural Language Processing

SLM

1672 joined

439 active

Start

Apr 03, 25

Jun 29, 25

Reveal

Jun 30, 25

did someone found a model that respects inference time of 100ms per prompt ?

Help · 13 Jun 2025, 19:31 · 8

to generate 100-300 tokens takes few seconds for every model i tried

Discussion 8 answers

Self-employed

That's not possible so far for a single prompt.

Upvotes 0

so this rule is impossible to reach, right ? what is the solution ?

Upvotes 0

if they change the rules, i hope that they will give us more time

Upvotes 0

Self-employed

tricky situation. According to them,"""How implementable is your code in a real application? Have you taken into account that the solution will be deployed on an edge device? - 25%""" and """Training should take no longer than 24 hours on a GPU similar to an NVIDIA T4 while inference should be on an NVIDIA Jetson Nano or equivalent.""" I think on the deployment side, the inference should be 100ms per prompt. That may be achieved when you convert into a particular format, such as AWQ, where efficiency is slightly decreased but inference is fast, a trade-off. @Amy_Bray, could you please assist?

Upvotes 0