GPT-4.1 Resists Producing Very Long Outputs Compared to GPT-4.1-mini

Scenario:

I’m working on extracting content from semi-structured documents that include tables, text, and other elements. The extraction follows a consistent pattern for a table: retrieving no_rows × no_columns items from the document.


Issue:

  • GPT-4.1 Output:
    GPT-4.1 often stops after partially extracting data (e.g., first 100+ entries) and then provides reasons like brevity or space limitations.

  • GPT-4.1-mini Output:
    While GPT-4.1-mini introduces some hallucinations, it successfully returns the desired number of items in full.


What I’ve Tried:

As suggested in the GPT-4.1 Prompting Guide Caveats, I strongly instructed the model to output all entries in full using prompts like:

Extract **ALL** (hundreds of entries) from the above data in **FULL**, following the instructions and rules provided in the system message.

Despite this, GPT-4.1 still truncates the output and cites space/brevity concerns.


Question for the Community:

  • Is this behavior expected for GPT-4.1 due to token or safety constraints?

  • Are there any proven strategies or prompt engineering techniques to force GPT-4.1 to return complete outputs without truncation?

2 Likes

Yes, OpenAI has models very reluctant to go over about 1800 tokens or so, with a strong effect of ‘wrapping up’ around 1500 tokens as a behavior. gpt-5 is the first that could compete with Gemini or Claude in writing 10000+ tokens for you (and gpt-5 still has dumb concepts of ‘writing less’ that save no tokens, like turning your code variables into abbreviations)

The best you can do is have a 1:1 recitation task such as proofreading, where the the fulfillment absolutely requires a mapping of input to output.

One assumes the training or even interdiction is because of limited computation resources.

Compare the cheapness, which can be compared to computation:

o4-mini will also write for a pretty long time, but you don’t get to control this by parameter:

1 Like