🩺 Join the Buzz: Clarification about Inference ...

Amrita Vishwa Vidyapeetham

Clarification about Inference Time

Platform · 17 May 2025, 13:02 · 10

What does 'Inference must be less than 100ms per vignette' mean? Does this refer to the total time taken to generate the entire output, or just the time to generate the first token?

Discussion 10 answers

hark99

Self-employed

The whole time for a single inference.

17 May 2025, 13:06

Upvotes 0

Eswardivi

Amrita Vishwa Vidyapeetham

is it possible for SLM to achieve those time constraints

17 May 2025, 13:08

Upvotes 0

mc74

I've been trying some things, quantization, ollama, ... and I don't see how it's possible, at least with qwen 0.6

replied to Eswardivi19 May 2025, 23:28

Upvotes 0

Joseph_gitau

African center for data science and analytics

The fastest I have achieved with T5 base is 280ms. A trial I am sure

replied to Eswardivi22 May 2025, 09:58

Upvotes 0

mc74

T5 seems like a good choice. I think because the inference requirements, I'm not going to continue this contest.

replied to Joseph_gitau22 May 2025, 16:30

Upvotes 0

Joseph_gitau

African center for data science and analytics

Yes, I actually achieved a 135 ms today.

replied to mc7422 May 2025, 16:38

Upvotes 0

mc74

@Joseph_gitau did you achieve that on a local PC? Just curious, I've been testing performance in colab.

replied to Joseph_gitau23 May 2025, 15:51

Upvotes 0

Joseph_gitau

African center for data science and analytics

Yes, It's on my local pc

replied to mc7424 May 2025, 09:09

Upvotes 0

Joseph_gitau

African center for data science and analytics

I achieved an inference speed of 75.7 ms today, I believe it's possible to stay Below the contraints. I might share the notebook for reference. Has a score of 0.35

24 May 2025, 09:08

Upvotes 0

Joseph_gitau

African center for data science and analytics

Have shared the notebook under notebooks

replied to Joseph_gitau24 May 2025, 09:36

Upvotes 1

Join the largest network for
data scientists and AI builders

About FAQs

Status