Hi everyone,
I’m running into a puzzling issue with the “o3-mini-2025-01-31” model, and I’m hoping some of you might have encountered (and hopefully solved) something similar. From the documentation I’ve read, it’s supposed to handle a context window of around 200k tokens, but I keep hitting what looks like a strict ~6.5k token cap in practice.
What’s Happening
- Whenever I feed a prompt larger than roughly 6,400–6,500 tokens, I get
finish_reason='length'
, and the returnedcontent
is just an empty string. Essentially, the model uses the entire buffer to “reason,” leaving no tokens left for an actual completion. - My usage logs confirm
prompt_tokens=6329
,total_tokens=6429
, and no completion tokens — so nothing is actually generated for me to read. - I tested the exact same prompt with “gpt-4o-mini” and had no problem getting a valid response. So it isn’t an overall issue with my script or environment.
- My account is active, and I have credits available, so this doesn’t look like a usage/quota problem.
Confusion
My main confusion is that the official docs for “o3-mini-2025-01-31” claim it supports 200k tokens, yet in reality I’m seeing effectively a 6k–7k limit. I tried drastically shortening my system prompt, dropping mention of temperature, and removing any potential policy triggers, but no luck. If the entire input is over ~6k, the model inevitably returns an empty response.
Question
- Have any of you also tried “o3-mini-2025-01-31” and observed this same mismatch between the stated 200k capacity and what the logs show?
- Is there a special approach or plan upgrade needed to unlock the full 200k tokens for this model?
- Or maybe it’s an undisclosed limitation and we really only have ~6.5k tokens to work with?
Any insights or experiences you could share would be greatly appreciated. I’d love to confirm whether I’m missing some setting or if I should just assume it’s a ~6k context model in practice. Thanks in advance for any help!