Get same_input -> same_output always. force argmax + random seed + clean token queue in generation. what is possible?

I don’t work for AI and didn’t program models. We can only extract evidence.

The randomness is obviously not reset to a fixed state between API calls, as this would defeat the purpose of diversity sampling for varied answering, and only ensure the same low-perplexity response each time.

You can replicate this prompt and the next token that it produces (the 46th context element) as one example to probe a case where two top probabilities are almost identical:

More:

The latter gives you sample chat endpoint code and gpt-4 results, and we also have the new gpt-3.5-turbo-instruct completion model with raw input that returns logprobs to experiment with.