Deterministic vLLM: Produced 1 unique completion.
Performance: There is a performance cost, but it’s not disastrous. In one test, a task that took 26 seconds on default vLLM took 42 seconds with the deterministic kernels.
Interesting. So this is a chip design issue?