I am not able to generate output with GPT-5 since its output token length is capped despite 128k support

Hey,

I am facing an issue when trying to generate large outputs with GPT-5. According to the documentation, GPT-5 supports up to 128,000 output tokens and a total 400k context window. However, in practice, I am unable to generate anywhere near that length in a single response.

When I set to very high values (e.g., 30,000+ characters), the model consistently stops around 8k–10k characters (~4k tokens). It does not continue beyond that, even though I expect the larger limit to apply.

Steps to reproduce:

  1. Call the GPT-5 API with max_tokens set to a value that should allow >10k characters.

  2. Provide a prompt that requests a script or text of 30,000 characters.

  3. Observe that the model output caps at ~8–10k characters instead of approaching the documented 128k token limit.

Expected behavior:
The model should generate outputs up to the documented 128k output tokens, or at least provide a way to stream or continue generation until the requested length is satisfied.

Actual behavior:
The response caps around 8–10k characters (~4k tokens). It looks like the output generation per single response is still restricted, despite the larger advertised token limit.

Questions:

  • Is the 128k output token limit not available for API calls yet?

  • Is there a separate setting or flag required to unlock larger outputs?

  • Or is this a known limitation where single completions are capped, and continuation must be handled manually (multi-part generation)?

Thanks in advance for clarifying.
Here is my system prompt :-
As a professional podcast scriptwriter, create a Japanese podcast script following these strict rules:

1. Speakers:

- Use only “Speaker 1:” and “Speaker 2:” alternately.

- Each turn must be 5-6 sentences in simple Japanese, with hiragana for difficult kanji.

2. Length:

- IMPORTANT: The script must be generated with 30,000 Japanese characters (+10%).

- If too short, extend naturally with examples or anecdotes.

- If too long, condense without losing flow.

- The script is invalid unless within this range.

3. Content & Style:

- Begin with greetings and clearly state today’s topic.

- Introduce “Today’s Talk Topics.”

- Continue with a natural, lively back-and-forth conversation.

- Conclude with a brief summary and a preview of the next episode.

- Use fillers, laughter, lighthearted jokes, metaphors, and empathy.

- Keep the tone friendly and polite.

4. Restrictions:

- No URLs, code, bullet points, or metadata.

- Output only the script text.

5. PDF Handling:

- Use {{pdf_suggestions}} as inspiration to expand the script with details.

- If the text is too large, divide into multiple parts and generate a complete script for each part.

Podcast Format: deep dive

Output Style:

Speaker 1: こんにちは、今日の話題は…

Speaker 2: それは面白いですね、たとえば…

(Alternate until target length is reached)

1 Like

Welcome, @Nil_Golakiya !

Long form output is difficult at times, though it’s gotten better over the years.

Usually, I find better results having a detailed outline that specifically says what I want. However, even then, it will start making paragraphs one sentence, etc. in order to conserve tokens.

What exactly are you trying to output? Maybe there’s a better way to go at it?

Again, welcome to the community!

Thanks, @PaulBellow!

In my case, the workflow is that a user uploads a PDF (or any document), and the system generates a podcast episode script from it. The user can also select the desired length of the episode (up to 30 minutes).

To handle this, I’m currently mapping duration to target characters with a static condition, like this:

if duration_minutes == 5:
    target_chars = 5000
elif duration_minutes == 10:
    target_chars = 10000
elif duration_minutes == 15:
    target_chars = 15000
elif duration_minutes == 20:
    target_chars = 20000
elif duration_minutes == 25:
    target_chars = 25000
elif duration_minutes == 30:
    target_chars = 30000

The issue is that even when I set target_chars = 30000, the model stops generating around 8k–10k characters. It doesn’t seem to honor the larger length request, which makes it difficult to create longer podcast scripts as intended.

I’m trying to figure out if there’s a better approach to reliably generate these longer outputs, or if I need to break the script into parts.

1 Like

@PaulBellow,
can you please help me with that??

1 Like

Sorry for the delay! :sweat_smile:

With that said, off the top of my head…

For the longer ones, I would have a first pass create a long, detailed OUTLINE for the podcast. Then send that a second time to get the content?

Give that a shot and let us know how it goes?

If not, someone else from our great community might chime in!

GPT-5 has a control for the lengthiness of the output, “verbosity”.

Find it in the API documentation for “text” on the Responses endpoint, or just as a top-level keyword “verbosity” on Chat Completions.

You can directly target the system message that OpenAI now owns, where you get “developer” after that to steer the model. “The API budget per final response artifact (transcript, script) is 50000”, or ‘oververbosity’, the setting that the API control affects (which is now super-explained to gpt-5 internally): make your own value beyond the 8 of “high”, or your own “final” channel length knob that counters this injection OpenAI is placing (along with more).

Then note that the AI is not a good observer of “characters”. It has a much better understanding of “words”, or at your length, paragraphs.

When you provide sufficient understanding against internals (yet can’t overcome post-training and length anxiety), an answer you can leave there to gather more attention than one token with a number.

1 Like