Temperature working with gpt-4-vision-preview from chat completion API? I get different results

I use the a standard payload definition

    return {
        "model": "gpt-4-vision-preview",
        "messages": messages,
        "max_tokens": 200,
        "temperature": 0.0,
        "n": 1
    }

Where my messages are prepared correctly. However, if I run inference with temperature=0.0 several times, I get different responses. I make sure that the base64 encoded image is the same, and, well, the text prompt is always the same. Is temperature bugged in GPT-4v or I’m missing something?

Interestingly, If I set n>1 then all responses are indeed the same, but If I run again with same prompt the responses will be different than before (still same with one another).

Any ideas on making responses deterministic? Is there a way to just activate greedy decoding and drop the sampling entirely?

P.S. Same thing if I set very low temperature, e.g. 1e-6

2 Likes

I was having the same issue, but I included “n”: 1 in my query and now it appears to be returning the same value consistently (at least with ~5 back-to-back tries).

Add’l context:

  1. Without the “n”: 1 statement I was receiving different responses for the same input… just as you were
  2. I had the same experience as you did when n > 1 (i.e., each response was consistent).