Defeating Nondeterminism in LLM Inference

jeffvpace · September 17, 2025, 4:00pm

Deterministic vLLM: Produced 1 unique completion.

Performance: There is a performance cost, but it’s not disastrous. In one test, a task that took 26 seconds on default vLLM took 42 seconds with the deterministic kernels.

Interesting. So this is a chip design issue?

Topic		Replies	Views
What are your strategies for spotting AI writing? Community chatgpt , writing	59	24361	January 6, 2026
GPT-4.5 preview does not appear to be deterministic Feedback	10	658	March 19, 2025
Rant: Forum Posts drafted by ChatGPT annoy me, am I alone? Community chatgpt , community-feedback	28	522	October 29, 2024
"Do this occasionally" - A potential (but strange) method to implement randomness Prompting gpt-4	12	2656	August 18, 2023
Embedding Model Determinism, big difference API api-embedding	3	835	April 7, 2025

Defeating Nondeterminism in LLM Inference

Related topics