What is going on with the GPT-5 API?

rankoneads · August 8, 2025, 7:48am

I feel like I must be missing something major here. I’ve been using OpenAI API models for over 3 years now, but this doesn’t make any sense.

Ask for a list of phrases on gpt-4.1-nano:

[prompt_tokens] => 258
[completion_tokens] => 47
[total_tokens] => 305

Ask for a list of phrases (same prompt) on gpt-5-nano:

[prompt_tokens] => 257
[completion_tokens] => 1402
[reasoning_tokens] => 1344
[total_tokens] => 1659

How in the world does it cost 1659 tokens to generate a list of 6 phrases? I can’t even generate articles with this model without putting reasoning_effort to minimal/low, and if you do that, the output is horrendous.

I put 4000 tokens for the max for an article, default reasoning_effort, it literally outputs nothing.

[content] =>
[finish_reason] => length

[prompt_tokens] => 1030
[completion_tokens] => 4000
[total_tokens] => 5030

What am I missing here?

merefield · August 8, 2025, 7:53am

Even 4.1-nano is not that great, perhaps reasoning is a bad fit for a tiny model.

Better luck with gpt5 full model with reasoning_effort “minimal”?

rankoneads · August 8, 2025, 7:55am

The expense is the reason for using gpt-5-nano though, gpt-4.1-mini and sometimes even nano work great for writing content. gpt-5-nano I legitimately cannot even get it to output an article, even when putting 4000 tokens. gpt-4.1-nano accomplished this without any problem…

It actually seems like it’s really bad with long prompts, it maxes out the reasoning tokens and outputs nothing. I don’t see how gpt-4.1-nano works well with long prompts but gpt-5-nano doesn’t. What a joke.

merefield · August 8, 2025, 7:56am

Yeah, you can see limitations here.

(btw, I’m not a fan of 4.1-nano because it’s terrible for function calling and summarization - the two things I do most - but glad you’ve found a good use for it)

rankoneads · August 8, 2025, 8:00am

I use gpt-4o-mini and gpt-4.1-mini for most things (obviously for coding I use more intelligent engines), but in some areas of content writing, gpt-4.1-nano actually provides more unique, natural-sounding content than the others believe it or not. This isn’t always the case, but sometimes it is.

merefield · August 8, 2025, 8:01am

I can see where creative outputs are desirable it could really work well … but in my use cases, I need strength on facts and quality text transformation, which 4.1 nano is ill-equipped to do.

I’m just jealous, because it is so cheap and fast

rankoneads · August 8, 2025, 8:08am

gpt-5-nano is unusable and it’s existence is pointless. with reasoning_effort minimal, it’s significantly worse than 4.1-nano for the same amount of tokens. Also there’s no temperature parameter in gpt-5, which also sucks. What a disappointment!

merefield · August 8, 2025, 8:09am

yeah, I was expecting a frontier LLM with no reasoning.

but we got a slightly less good LLM (e.g. shorter context window, no control over temperature), with high quality reasoning in the bigger models.

ptkbhv · August 8, 2025, 2:23pm

Has anyone faced this? I am getting a lot of empty responses from GPT 5 series, specifically from GPT5. In the token stats, it is showing non-zero output token count. So it seems to be thinking and then giving an empty response?

IAmJackHarper · August 8, 2025, 2:43pm

Can’t describe how much I miss 4.5, that was the peak of pushing boundaries. Probably the 5 makes more sense in efficiency and can fill the size gap with reasoning, but I was expecting a huge no reasoning model. This is more an o4 than a GPT 5

rhythm_shahriar · August 8, 2025, 3:04pm

i am having the worse, i am asking something, its replying with something else!!!

oekekezie · August 8, 2025, 4:41pm

Yes, at least in the playground, I have been noticing that, for harder tasks or when reasoning effort and verbosity are both set to “high”, GPT-5 seems be generating thousands of reasoning tokens but then outputs an empty response – of course the billing / usage tracking is working though . Given how unreliable GPT-5 seems to be in the Playground, I haven’t even bothered using the API directly – think I will stick with o3 when I need structured outputs and, when I don’t, using either qwen3 235B A22B Thinking-2507 and gpt-oss-120b via Weights & Biases Inference.

jim · August 8, 2025, 6:18pm

Came here looking for info on the same. GPT-5 (medium reasoning) will reason forever, almost too much, but then won’t output a final message - no errors, nothing. This is through ResponsesAPI.

jim · August 9, 2025, 4:15am

Turns out that with GPT-5 you really need to set max_completion_tokens to a higher number to account for all reasoning + response tokens - it was stopping at 2048. Once I bumped it up to 5000 all is working as expected now.

rankoneads · August 9, 2025, 5:06am

I’m not sure why that should matter for getting any response at all. If there’s not enough tokens, it should stop wherever is runs out… not just return a blank output. There’s no way that’s how it’s supposed to work. Otherwise you could spend thousands and thousands of tokens but because you have it slightly off, you get no response at all.

Imagine spending 5000 tokens, then oh you needed 5001 so you get a completely blank response. Now you wasted 5000 tokens worth of money for nothing. But again, this sort of terrible functionality is very good for OpenAI’s bottom line.

jim · August 9, 2025, 8:00pm

I agree - but thats how I got everything to start working. It was pretty clear - without setting some max_completion_tokens GPT-5 would reason through 2048 tokens and then finish with the reason that the limit was reached – even though it wasn’t set anywhere.

This could be something they are currently fixing - I was just leaving it here to help out others who might run into the same issue.

_j · August 9, 2025, 8:11pm

The AI produces its internal reasoning also as billed output tokens.

max_tokens: constrained model generation to a token count
max_completion_tokens: same effect - can stop the internal generation too based on this “budget parameter” you now pass.
max_output_tokens: same as the last, renamed on Responses.

There could be an alternate parameter, “maximum I want to see before the output is cut off before being complete”. However then you’d go “why did I spend 5000 tokens to only get my 1024 tokens of a half-response??”

Solution:

Increase max_output_tokens to larger than the model would ever produce in reasoning + output, considering your requested reasoning effort and maximum task difficulty.
Use this parameter solely to stop an AI going crazy and never stopping.

OpenAI required solution:

don’t hide the maximum tokens from the main display of Playground parameters
don’t set the default of reasoning models to 2048 tokens
don’t display the wrong settings set for the model after entering the playground, requiring refresh with the URL.

jamilbio20 · August 10, 2025, 7:30pm

The wrapper can omit the max tokens settings in the request block, effectively bypassing the problem of not enough budget tokens. The user can always Ctr+C and kill the connection, for which I guess he does not pay more than actually received.

philtrevor · August 10, 2025, 9:54pm

Im also facing the issue of GPT-5-nano returning no response/content when setting a max_completion_tokens parametre - its almost like they didn’t even test the API.

philtrevor · August 10, 2025, 9:55pm

Im also getting empty responses from GPT-5-nano… very disappointing.

Topic		Replies	Views
GPT-5 is very slow compared to 4.1 (Responses API) API gpt-5 , reasoning , gpt-41 , responses-api	63	25708	September 15, 2025
GPT 5 returning empty reasoning and no content output with stored prompt Bugs	4	540	September 8, 2025
Need "reasoning: false" option for GPT-5 Feedback gpt-5	14	3438	September 19, 2025
OpenAI Why Are The API Calls So Slow? When will it be fixed? API	103	55889	February 19, 2024
Temperature in GPT-5 models API gpt-5	33	22790	September 8, 2025

What is going on with the GPT-5 API?

Related topics