Inconsistent Token Limits with “o3-mini-2025-01-31” Model—Empty Response Despite Supposed Large Context?

WiseDuckDev · March 3, 2025, 9:01pm

Hi everyone,

I’m running into a puzzling issue with the “o3-mini-2025-01-31” model, and I’m hoping some of you might have encountered (and hopefully solved) something similar. From the documentation I’ve read, it’s supposed to handle a context window of around 200k tokens, but I keep hitting what looks like a strict ~6.5k token cap in practice.

What’s Happening

Whenever I feed a prompt larger than roughly 6,400–6,500 tokens, I get finish_reason='length', and the returned content is just an empty string. Essentially, the model uses the entire buffer to “reason,” leaving no tokens left for an actual completion.
My usage logs confirm prompt_tokens=6329, total_tokens=6429, and no completion tokens — so nothing is actually generated for me to read.
I tested the exact same prompt with “gpt-4o-mini” and had no problem getting a valid response. So it isn’t an overall issue with my script or environment.
My account is active, and I have credits available, so this doesn’t look like a usage/quota problem.

Confusion

My main confusion is that the official docs for “o3-mini-2025-01-31” claim it supports 200k tokens, yet in reality I’m seeing effectively a 6k–7k limit. I tried drastically shortening my system prompt, dropping mention of temperature, and removing any potential policy triggers, but no luck. If the entire input is over ~6k, the model inevitably returns an empty response.

Question

Have any of you also tried “o3-mini-2025-01-31” and observed this same mismatch between the stated 200k capacity and what the logs show?
Is there a special approach or plan upgrade needed to unlock the full 200k tokens for this model?
Or maybe it’s an undisclosed limitation and we really only have ~6.5k tokens to work with?

Any insights or experiences you could share would be greatly appreciated. I’d love to confirm whether I’m missing some setting or if I should just assume it’s a ~6k context model in practice. Thanks in advance for any help!

_j · March 3, 2025, 9:28pm

If you’re going to send max_completion_tokens as a parameter, ensure that there is enough length to also include the internal reasoning. Like: set it to 34567.

A finish reason of “length” means the output is being terminated - by the setting you provided.

WiseDuckDev · March 4, 2025, 12:37am

J, you were right.

The problem was coming from the max_completion_tokens parameter.

I didn’t know that I also had to leave room for the internal reasoning.

Thanks for for the helpful tip.

Topic		Replies	Views
Not allowed to have all 8192 tokens API gpt-4	16	11913	December 18, 2023
Intermittent “length limit was reached” error using GPT-4o-mini via Azure — even with short prompts and completions Bugs	0	300	March 25, 2025
Not enough tokens error, even though I've paid A LOT (maximum context length error) API api	5	6082	September 9, 2023
Error Encountered When Using max_tokens Parameter with GPT-4 API API gpt-4 , api	5	3010	December 19, 2023
O1-mini model output '' with finish_reason of length Bugs o1-preview	1	453	November 12, 2024

Inconsistent Token Limits with “o3-mini-2025-01-31” Model—Empty Response Despite Supposed Large Context?

What’s Happening

Confusion

Question

Related topics