GPT-5 has a temperature issue

GPT-5-Thinking is a great model, but I think it has a temperature issue.

Whenever I use it for any kind of writing, it makes… odd… word choices. Often writing borderline nonsensical sentences, even making grammatical errors sometimes. It’s so bad, the stories are confusing and completely unreadable. A lot of details make no sense, like it keeps picking less probable tokens and writing itself into a corner.

A few examples. Here, a story where a delivery driver showed up:

The courier slapped a little sticker with my name and a QR code on the case, winked like delivery guys do when they pretend they know your whole life, and left.

Do… do delivery guys wink at people pretending to know your whole life? I feel like the probability of the word “winked” here was very low, but it was picked anyway, and the model then tried to backpedal and justify it by making up some relatable experience that doesn’t exist.

In a story I had it write taking place in a Pokémon world:

“Are you a Pokémon trainer?” She asked.
“No—I don’t—I’m wearing a hoodie.”

If this sounds confusing, it’s because it is, even in context. There is no connection between “wearing a hoodie” and “not being a Pokémon trainer”.

Here, describing the narrator’s apartment:

My apartment was the kind where you learn to keep your socks on and your cables wrapped, because one clumsy step turned into a domino collapse of stands, tripods, and guilt.

…It almost sounds fine, but it makes no sense if you attempt to actually understand what it’s trying to say:

  1. You don’t take your socks off when entering someone’s apartment;
  2. There is no such social rule as “keeping your cables wrapped” at someone’s place. That implies you bring cables there?
  3. What is a “domino collapse of guilt”?

In another story, GPT-5 stopped using question marks for dialogue..? Characters were asking questions like this: “What is going on.” “What do you think this means.” and it made the whole story very strange and unnerving to read. I obviously never prompted it to do that, it just kind of started doing it halfway through.

But the thing is, these aren’t isolated problems. Every time I ask GPT-5 to generate stories, issues like these appear all over the place. I don’t mean every so often- I mean that in the whole story, there’s this general feeling, while reading, that things are not quite making sense, like the model was intoxicated while writing it. Odd word choices, dialogue that doesn’t flow, metaphors that have no sense of causation… it’s all slightly “loose”. I know I probably sound like I’m nitpicking, but these are issues that models like Claude 4 and Gemini 2.5 don’t have with the exact same prompts. I honestly feel this even when conversing with the model for non-writing tasks. The way it writes sounds off.

I tried using GPT-5-chat instead, and setting the temperature to 0 for that model seems to fix most of these issues! But, GPT-5-chat refuses to write anything longer than a few paragraphs, unlike GPT-5-thinking (or the Claude models I usually use for this kind of writing prompts), so it’s still unusable for this use case.

This is a shame because GPT-5-thinking otherwise made some pretty interesting plot decisions from the prompts I gave it, and is the only model, with Claude 3.7 and 4, that is perfectly okay with writing several thousand words at once from a prompt.

I know setting a high temperature helps with the creativity of the model for problem-solving tasks, but for writing, it just makes it spout “light nonsense”. I’d love to know if anyone else experienced this, although I assume fixing this niche use case is not OpenAI’s priority right now. In the meantime, I’ll stick to Claude.

6 Likes

GPT 4.5 is still available in ChatGPT as a legacy model.
When it comes to writing every other model becomes irrelevant, in my opinion.

I hope this helps, even if only to get some style guides for GPT 5.

1 Like

GPT 4.5 is good, but it also refuses to write several thousand words at once, which is what I need for my use case. So far the only models that can write that much are Claude 3.7/4/4.1 and… GPT-5-Thinking, even though the writing is really weird. That’s why I care; it’s the only other model I’ve seen that can write so much at once. And because it seems like a temperature issue, not a training or intelligence issue.

2 Likes

Agree; it should at least be able to be passed a top_p to stop it randomly switching to other languages on the roll of the dice. Or “oops, gonna go off the rails and start spittin’ from old turns”, which is just poor self-attention, really.

0.01% bad breaking token * 2000 tokens * many calls = bad time

2 Likes

does gpt-5 accept a temperature variable? I thought temperature was retired with gpt-5? No?

1 Like

GPT-5-chat does, but GPT-5-thinking doesn’t. The model still has a temperature setting internally, and that’s my point, I think it’s set way too high.

2 Likes

I can deeply relate to this issue. When I invite GPT-5 to collaborate with me on a story, it can capture emotions but fails to retain the full context, often contradicting itself. Only GPT-5-Thinking is capable of remembering everything and making the right choices. However, GPT-5-Thinking comes with obvious stylistic problems that are difficult to correct. For example, even when I give clear instructions and provide examples, it will still use a large amount of Chinese martial-arts terminology in a European dark gothic fantasy setting, which makes the entire narrative feel bizarre. I truly hope this problem can be resolved, so that I can at least have one reliable tool I can use consistently.

2 Likes

If you use the project feature and put a rules text file into the project, mandating no eastern terms… and lock in a brief reminder to double check the rules project file, it’ll keep up a lot better.

If you write in your memories even that under no circumstance should eastern terms be applied to your fantasy word because it breaks immersion for your readers… then that will solve the one issue fast and easy

Making a project tho, with a file ruleset allows you so-much more content and context when doing long chats such as you seem to be.

2 Likes

Thank you for your response and suggestions.
Indeed, I have been using the project feature for my creative work, and I’ve invested a lot of effort into prompt engineering. Ever since GPT-4, I’ve been passionate about interactive storytelling, and as you said, the project feature is very convenient.

I’ve also set clear requirements and rules in my prompts, provided style demonstrations, and even created lists of forbidden words. I’ve explicitly instructed the model during conversations as well. Unfortunately, GPT-5-Thinking often fails to recognize the differences in the vocabulary and style it uses. It keeps apologizing, yet continues to repeat the same mistakes.

After repeatedly correcting it for a while, I realized that prompt engineering alone cannot guarantee long-term stability. This eventually forced me to return to the 4o mode. Still, I truly hope this issue can be improved in the future, as such a change would be extremely helpful for writers working on long-form storytelling.

2 Likes

sometimes i get those loops where it keep apologizing

i just fire up a new session and that solves that.

would you be against sharing your rule set with us here on the forum? using the drop down >hide details feature located on the very right side of the tool bar at the top of your reply window perhaps…

i’m curious if it might conflict at all or if it’s super huge or whatever else…

i might be able to find a solution from looking through it for you… might not

might as well :slight_smile:

Yeah you say all that in your prompts but have you tried putting a text file with that extreme list of things to parse each and every token with?

I’m suggesting that but I dont see that in your troubleshooting email thinger yet.