Defeating Nondeterminism in LLM Inference

  • Deterministic vLLM: Produced 1 unique completion.

  • Performance: There is a performance cost, but it’s not disastrous. In one test, a task that took 26 seconds on default vLLM took 42 seconds with the deterministic kernels.

Interesting. So this is a chip design issue?

5 Likes