GPT-5 has a temperature issue

GPT-5-Thinking is a great model, but I think it has a temperature issue.

Whenever I use it for any kind of writing, it makes… odd… word choices. Often writing borderline nonsensical sentences, even making grammatical errors sometimes. It’s so bad, the stories are confusing and completely unreadable. A lot of details make no sense, like it keeps picking less probable tokens and writing itself into a corner.

A few examples. Here, a story where a delivery driver showed up:

The courier slapped a little sticker with my name and a QR code on the case, winked like delivery guys do when they pretend they know your whole life, and left.

Do… do delivery guys wink at people pretending to know your whole life? I feel like the probability of the word “winked” here was very low, but it was picked anyway, and the model then tried to backpedal and justify it by making up some relatable experience that doesn’t exist.

In a story I had it write taking place in a Pokémon world:

“Are you a Pokémon trainer?” She asked.
“No—I don’t—I’m wearing a hoodie.”

If this sounds confusing, it’s because it is, even in context. There is no connection between “wearing a hoodie” and “not being a Pokémon trainer”.

Here, describing the narrator’s apartment:

My apartment was the kind where you learn to keep your socks on and your cables wrapped, because one clumsy step turned into a domino collapse of stands, tripods, and guilt.

…It almost sounds fine, but it makes no sense if you attempt to actually understand what it’s trying to say:

  1. You don’t take your socks off when entering someone’s apartment;
  2. There is no such social rule as “keeping your cables wrapped” at someone’s place. That implies you bring cables there?
  3. What is a “domino collapse of guilt”?

In another story, GPT-5 stopped using question marks for dialogue..? Characters were asking questions like this: “What is going on.” “What do you think this means.” and it made the whole story very strange and unnerving to read. I obviously never prompted it to do that, it just kind of started doing it halfway through.

But the thing is, these aren’t isolated problems. Every time I ask GPT-5 to generate stories, issues like these appear all over the place. I don’t mean every so often- I mean that in the whole story, there’s this general feeling, while reading, that things are not quite making sense, like the model was intoxicated while writing it. Odd word choices, dialogue that doesn’t flow, metaphors that have no sense of causation… it’s all slightly “loose”. I know I probably sound like I’m nitpicking, but these are issues that models like Claude 4 and Gemini 2.5 don’t have with the exact same prompts. I honestly feel this even when conversing with the model for non-writing tasks. The way it writes sounds off.

I tried using GPT-5-chat instead, and setting the temperature to 0 for that model seems to fix most of these issues! But, GPT-5-chat refuses to write anything longer than a few paragraphs, unlike GPT-5-thinking (or the Claude models I usually use for this kind of writing prompts), so it’s still unusable for this use case.

This is a shame because GPT-5-thinking otherwise made some pretty interesting plot decisions from the prompts I gave it, and is the only model, with Claude 3.7 and 4, that is perfectly okay with writing several thousand words at once from a prompt.

I know setting a high temperature helps with the creativity of the model for problem-solving tasks, but for writing, it just makes it spout “light nonsense”. I’d love to know if anyone else experienced this, although I assume fixing this niche use case is not OpenAI’s priority right now. In the meantime, I’ll stick to Claude.

1 Like

GPT 4.5 is still available in ChatGPT as a legacy model.
When it comes to writing every other model becomes irrelevant, in my opinion.

I hope this helps, even if only to get some style guides for GPT 5.

1 Like

GPT 4.5 is good, but it also refuses to write several thousand words at once, which is what I need for my use case. So far the only models that can write that much are Claude 3.7/4/4.1 and… GPT-5-Thinking, even though the writing is really weird. That’s why I care; it’s the only other model I’ve seen that can write so much at once. And because it seems like a temperature issue, not a training or intelligence issue.

2 Likes

Agree; it should at least be able to be passed a top_p to stop it randomly switching to other languages on the roll of the dice. Or “oops, gonna go off the rails and start spittin’ from old turns”, which is just poor self-attention, really.

0.01% bad breaking token * 2000 tokens * many calls = bad time

2 Likes

does gpt-5 accept a temperature variable? I thought temperature was retired with gpt-5? No?

GPT-5-chat does, but GPT-5-thinking doesn’t. The model still has a temperature setting internally, and that’s my point, I think it’s set way too high.

2 Likes