Intermittent “length limit was reached” error using GPT-4o-mini via Azure — even with short prompts and completions

jacab · March 25, 2025, 11:58am

Hi everyone,

I’m running into an intermittent issue when using the gpt-4o-mini model via Azure OpenAI, and I can’t quite figure out what’s going on.

Context
• I’m analyzing a database and making many API calls in a loop.
• Each request uses a prompt of about 800–900 tokens.
• The expected completion should be very small (around 200–300 tokens maximum).
• I explicitly set max_tokens=4000 (well above what’s needed, but safely under all limits).
• I’m also using the response_format parameter to get a structured output.
• The Azure deployment has a TPM limit of 30k, and I’ve also tested this with the gpt-4o model (which has a 450k TPM limit), and the issue still occurs.

The error
Occasionally — not always, but randomly during one of the loop iterations — I get this error:
Could not parse response content as the length limit was reached -
CompletionUsage(
completion_tokens=4000,
prompt_tokens=874,
total_tokens=4874,
completion_tokens_details=CompletionTokensDetails(accepted_prediction_tokens=0, …),
prompt_tokens_details=PromptTokensDetails(…)
)

This is confusing because:
• The total tokens are clearly below any hard context limit (only ~4.8k total).
• The max_tokens is set.
• The completion should not exceed 300 tokens anyway.
• The same call can succeed many times and then fail unexpectedly.

It’s tedious to reproduce, because I need to let the loop run until one of the calls randomly triggers the error.

Example code
completion = client.beta.chat.completions.parse(
model = “gpt-4o-mini”,
messages = [
{“role”: “system”, “content”: prompt1, “type”: “text”},
{“role”: “system”, “content”: prompt2, “type”: “text”},
{“role”: “user”, “content”: str(content), “type”: “text”},
],
response_format = modelResponse,
temperature = 0.2,
top_p = 0.9,
logprobs = True,
max_tokens = 4000
)

I don’t know what to do to solve this problem.

Thanks in advance.

Topic		Replies	Views
Length Finish Reason Error despite not exceeding completion limit Bugs api , structured-output	1	1056	March 4, 2025
Inconsistent Token Limits with “o3-mini-2025-01-31” Model—Empty Response Despite Supposed Large Context? API api , limitations , system-limitation	2	1093	March 4, 2025
Bizarre issue preventing response from gpt-4o-mini (‘The model produces invalid content’) API bug , api , error , help , gpt-4o-mini	8	3923	February 27, 2025
Azure GPT-4-Turbo JSON mode response generation breaks after 1024 tokens Bugs gpt-4 , chatgpt , api	4	3478	March 20, 2024
Error message with longer inputs Prompting	5	2623	September 5, 2024

Intermittent “length limit was reached” error using GPT-4o-mini via Azure — even with short prompts and completions

Related topics