Achieving deterministic API output on language models - HOWTO

Yes, it turns out that the 3.5-turbo models, while indeed my small top_p setting locks in the top token more than actually setting to 0, there is still non-determinism in the logits, and there can be position changes of “top” in long outputs.

I did a thorough investigation here, using the gpt-3.5-turbo-instruct model:

1 Like