Inconsistent Token Limits with “o3-mini-2025-01-31” Model—Empty Response Despite Supposed Large Context?

Hi everyone,

I’m running into a puzzling issue with the “o3-mini-2025-01-31” model, and I’m hoping some of you might have encountered (and hopefully solved) something similar. From the documentation I’ve read, it’s supposed to handle a context window of around 200k tokens, but I keep hitting what looks like a strict ~6.5k token cap in practice.

What’s Happening

  1. Whenever I feed a prompt larger than roughly 6,400–6,500 tokens, I get finish_reason='length', and the returned content is just an empty string. Essentially, the model uses the entire buffer to “reason,” leaving no tokens left for an actual completion.
  2. My usage logs confirm prompt_tokens=6329, total_tokens=6429, and no completion tokens — so nothing is actually generated for me to read.
  3. I tested the exact same prompt with “gpt-4o-mini” and had no problem getting a valid response. So it isn’t an overall issue with my script or environment.
  4. My account is active, and I have credits available, so this doesn’t look like a usage/quota problem.

Confusion

My main confusion is that the official docs for “o3-mini-2025-01-31” claim it supports 200k tokens, yet in reality I’m seeing effectively a 6k–7k limit. I tried drastically shortening my system prompt, dropping mention of temperature, and removing any potential policy triggers, but no luck. If the entire input is over ~6k, the model inevitably returns an empty response.

Question

  • Have any of you also tried “o3-mini-2025-01-31” and observed this same mismatch between the stated 200k capacity and what the logs show?
  • Is there a special approach or plan upgrade needed to unlock the full 200k tokens for this model?
  • Or maybe it’s an undisclosed limitation and we really only have ~6.5k tokens to work with?

Any insights or experiences you could share would be greatly appreciated. I’d love to confirm whether I’m missing some setting or if I should just assume it’s a ~6k context model in practice. Thanks in advance for any help!

If you’re going to send max_completion_tokens as a parameter, ensure that there is enough length to also include the internal reasoning. Like: set it to 34567.

A finish reason of “length” means the output is being terminated - by the setting you provided.

1 Like

J, you were right.

The problem was coming from the max_completion_tokens parameter.

I didn’t know that I also had to leave room for the internal reasoning.

Thanks for for the helpful tip.

2 Likes