What is going on with the GPT-5 API?

I feel like I must be missing something major here. I’ve been using OpenAI API models for over 3 years now, but this doesn’t make any sense.

Ask for a list of phrases on gpt-4.1-nano:

[prompt_tokens] => 258
[completion_tokens] => 47
[total_tokens] => 305

Ask for a list of phrases (same prompt) on gpt-5-nano:

[prompt_tokens] => 257
[completion_tokens] => 1402
[reasoning_tokens] => 1344
[total_tokens] => 1659

How in the world does it cost 1659 tokens to generate a list of 6 phrases? I can’t even generate articles with this model without putting reasoning_effort to minimal/low, and if you do that, the output is horrendous.


I put 4000 tokens for the max for an article, default reasoning_effort, it literally outputs nothing.

[content] =>
[finish_reason] => length
[prompt_tokens] => 1030
[completion_tokens] => 4000
[total_tokens] => 5030

What am I missing here?

5 Likes

Even 4.1-nano is not that great, perhaps reasoning is a bad fit for a tiny model.

Better luck with gpt5 full model with reasoning_effort “minimal”?

The expense is the reason for using gpt-5-nano though, gpt-4.1-mini and sometimes even nano work great for writing content. gpt-5-nano I legitimately cannot even get it to output an article, even when putting 4000 tokens. gpt-4.1-nano accomplished this without any problem…

It actually seems like it’s really bad with long prompts, it maxes out the reasoning tokens and outputs nothing. I don’t see how gpt-4.1-nano works well with long prompts but gpt-5-nano doesn’t. What a joke.

6 Likes

Yeah, you can see limitations here.

(btw, I’m not a fan of 4.1-nano because it’s terrible for function calling and summarization - the two things I do most - but glad you’ve found a good use for it)

1 Like

I use gpt-4o-mini and gpt-4.1-mini for most things (obviously for coding I use more intelligent engines), but in some areas of content writing, gpt-4.1-nano actually provides more unique, natural-sounding content than the others believe it or not. This isn’t always the case, but sometimes it is.

1 Like

I can see where creative outputs are desirable it could really work well … but in my use cases, I need strength on facts and quality text transformation, which 4.1 nano is ill-equipped to do.

I’m just jealous, because it is so cheap and fast :wink:

1 Like

gpt-5-nano is unusable and it’s existence is pointless. with reasoning_effort minimal, it’s significantly worse than 4.1-nano for the same amount of tokens. Also there’s no temperature parameter in gpt-5, which also sucks. What a disappointment!

7 Likes

yeah, I was expecting a frontier LLM with no reasoning.

but we got a slightly less good LLM (e.g. shorter context window, no control over temperature), with high quality reasoning in the bigger models.

2 Likes

Has anyone faced this? I am getting a lot of empty responses from GPT 5 series, specifically from GPT5. In the token stats, it is showing non-zero output token count. So it seems to be thinking and then giving an empty response?

14 Likes

Can’t describe how much I miss 4.5, that was the peak of pushing boundaries. Probably the 5 makes more sense in efficiency and can fill the size gap with reasoning, but I was expecting a huge no reasoning model. This is more an o4 than a GPT 5

3 Likes

i am having the worse, i am asking something, its replying with something else!!!

2 Likes

Yes, at least in the playground, I have been noticing that, for harder tasks or when reasoning effort and verbosity are both set to “high”, GPT-5 seems be generating thousands of reasoning tokens but then outputs an empty response – of course the billing / usage tracking is working though :upside_down_face: . Given how unreliable GPT-5 seems to be in the Playground, I haven’t even bothered using the API directly – think I will stick with o3 when I need structured outputs and, when I don’t, using either qwen3 235B A22B Thinking-2507 and gpt-oss-120b via Weights & Biases Inference.

1 Like

Came here looking for info on the same. GPT-5 (medium reasoning) will reason forever, almost too much, but then won’t output a final message - no errors, nothing. This is through ResponsesAPI.

2 Likes

Turns out that with GPT-5 you really need to set max_completion_tokens to a higher number to account for all reasoning + response tokens - it was stopping at 2048. Once I bumped it up to 5000 all is working as expected now.

4 Likes

I’m not sure why that should matter for getting any response at all. If there’s not enough tokens, it should stop wherever is runs out… not just return a blank output. There’s no way that’s how it’s supposed to work. Otherwise you could spend thousands and thousands of tokens but because you have it slightly off, you get no response at all.

Imagine spending 5000 tokens, then oh you needed 5001 so you get a completely blank response. Now you wasted 5000 tokens worth of money for nothing. But again, this sort of terrible functionality is very good for OpenAI’s bottom line.

1 Like

I agree - but thats how I got everything to start working. It was pretty clear - without setting some max_completion_tokens GPT-5 would reason through 2048 tokens and then finish with the reason that the limit was reached – even though it wasn’t set anywhere.

This could be something they are currently fixing - I was just leaving it here to help out others who might run into the same issue.

1 Like

The AI produces its internal reasoning also as billed output tokens.

  • max_tokens: constrained model generation to a token count
  • max_completion_tokens: same effect - can stop the internal generation too based on this “budget parameter” you now pass.
  • max_output_tokens: same as the last, renamed on Responses.

There could be an alternate parameter, “maximum I want to see before the output is cut off before being complete”. However then you’d go “why did I spend 5000 tokens to only get my 1024 tokens of a half-response??”

Solution:

  • Increase max_output_tokens to larger than the model would ever produce in reasoning + output, considering your requested reasoning effort and maximum task difficulty.
  • Use this parameter solely to stop an AI going crazy and never stopping.

OpenAI required solution:

  • don’t hide the maximum tokens from the main display of Playground parameters
  • don’t set the default of reasoning models to 2048 tokens
  • don’t display the wrong settings set for the model after entering the playground, requiring refresh with the URL.
3 Likes

The wrapper can omit the max tokens settings in the request block, effectively bypassing the problem of not enough budget tokens. The user can always Ctr+C and kill the connection, for which I guess he does not pay more than actually received.

Im also facing the issue of GPT-5-nano returning no response/content when setting a max_completion_tokens parametre - its almost like they didn’t even test the API.

Im also getting empty responses from GPT-5-nano… very disappointing.