Hey stephane, I’m facing the same issue. It’s a headache since GPT-4 as of today and GPT-4-0613 are different. Happens with one of my prompts that GPT-4 solves it properly and GPT-4-0613 doesn’t follows the system prompt. Tested on playground vs API.
OpenAI should provide a better handling of model versions.
I think you are suffering an illusion. Take your example of solve/not solve. Set the temperature and top_p to 0. You’re going to get the same result every time from a model call from the same input.
Then you just must replicate the settings. That includes every character or whitespace. Showing you the same token usage in the daily use log. Playground will give “example code” that shows the settings.
“gpt-4” gives you “gpt-4-0613” because the name is only an alias that points to the currently-recommended model, which continues to be updated.
Unfortunately it seem I’ve been fooled by randomness here. I tried to replicate but again stochastic behaviour showed. Sometimes works some other don’t.
Regarding the opening post, as you mentioned, it’s true there seems to be no difference
between GPT-4 and GPT-4-613. But within them even with temperature = 0 and top p 0.01 (cannot be set to 0), they are not fully deterministic. It seems something changes when reloading playground or api calling.
Since this is no more related with this post I’m going to open a new topic.