O4-mini-high and o3 context window issue and high hallucination rate

Hi everyone,

I’ve been using ChatGPT for coding everyday since several months with the “Plus” subscription. I’ve always been satisfied by the result I got from ChatGPT. o3-mini-high was working great to generate code or help me improve or debug large chunks of code. I was able to input 80k+ tokens prompts and the answers were relevant to what I asked for. Same for o1.

Until now.

Indeed, since a few days, o3-mini-high and o1 models are gone, replaced by o4-mini-high and o3. Since then, I encounter so many issues with them.

The first one is the size of the input context window. For o3-mini-high, o1, o4-mini-high or o3, it should be 200k tokens. However, copy and pasting prompts about 80k tokens long, working with o3-mini-high or o1 are now not working with o4-mini-high or o3.

I can submit my prompt, but then I get the error message: “The message you submitted was too long, please reload the conversation and submit something shorter.”.
Clicking on the “Retry” button either gives me answers like “Alright! If anything comes up or you need help with something else, just let me know.” or totally random stuff. In both cases my whole prompt have been ignored.
I never managed to get a relevant answer to my prompt with those new models, whereas it was perfectly working before.

Then, if I write a much smaller prompt for code generation - 14k tokens instead of 80k - I now get an answer related to my prompt, however, the hallucination rate is really high. The written code is calling non-existing functions or methods in Python or assumes incorrect inner workings of Python. It also formats its answer with “+” and “-” at the beginning of each line, like a Git diff, so I have to remove them with a regular expression before being able to use the code…

Can someone explain me what is happening there? Are you encountering the same issues as me?

4 Likes

I’m also experiencing the same issues. The context limit is supposed to be around 200K tokens, but it can’t even handle a prompt of ~80K tokens—I get the same error.

I’m very disappointed.

2 Likes

Same issue for me. Please solve this

2 Likes

Up. Same problem here. I can’t continue my old conversations, which o3 mini or o1 handled easily.

2 Likes

Here is the issue plain and simple:

The old o1 was 100 horse powers, o3-mini-high was 70 horse powers.

o4-mini-high and o3 (in the crippled form they exist on Plus tier) are 2 horsepowers.

Grok-3 is 110 horse powers.

Everyone telling people that the models will still improve over time and OpenAI is resolving issues is BULLSHITTING you. Yes they will improve from 2 horsepowers to like 3 or 5 horsepowers.

But to step back from 100 horsepowers to like 2 or 5 horsepowers in the first place was 100% deliberate and it will NEVER go back to 100 horsepowers, unless they jump out of their crazy suicide train and come forward, silently or explicitly, with something like “WHOOPS WE FORGOT TO ADD A ZERO AT THE END OF OUTPUT AND INPUT TOKENS AND QUANTIZATION LEVEL, WHOOOPS, SOOORRRY!!! Shit happens” (while of course it was 100% deliberate to see how much they can bullshit you before you leave and save like 50x on cost with the new potato-level hardware consumption).

Everyone has already migrated to Gemini 2.5 (too clunky right now to actually write code but excellent for code analysis, and 1 month free again and again with new anonymity credit cards), Deepseek to replace 4o and Grok-3 ON FREE TIER as the main power horse. Or some other combo with competitors that actually work and can eat like 10MB of codebase or return 2000 lines of code without issues and such things.

One can only guess, but it seems their models are economically non-sustainable and cost like $2000 per query in an uncrippled form, so to cripple them this much on Plus tier was the only thing they could still do to maintain business. So for now it is RIP to OpenAI. Everyone migrates away now and will probably never come back. Especially not if there is no preview on free tier for the cutting edge stuff. It was a HUGE violation of trust to remove o1 and o3-mini-high with this potato-level crap for people who still had a month or a year worth of subscription, although access to those models was ostensibly guaranteed when you paid and subscribed. People will not make this error again.

I think if they don’t bring back o1 and o3-mini-high on Plus tier in identical form as before and soon, it really is RIP for them, as o3 and o4-mini-high are already making a name for itself as being total garbage, and they have to explain themselves BIG TIME how the models were crippled before and what they did to fix them, or all the then unsubscribed people will not buy into it at all, and simply assume they are still the trash they were when introduced. Which again raises the question why they did this massive crippling of the models in the first place. And that can’t really be explained without revealing details that raise massive trust violations.

Personally my subscription ran out 2 days ago, luckily, so I am not trying to be demanding here for myself. I’m saying this thinking of all those other poor people and for the company to steer onto a course that will not be total suicide.

3 Likes

How do you achieve this result ? Which tool / model / prompt are you using?
Being able to do this could really help me when developing code.

Grok-3 outputs close to 2000 lines of code in one go (1600 is a safe amount) without destroying it or turning it inside out. It is better than o1 in many ways, less dominant and less code alterations that are out of place, very workable. Like o1 it might run into walls that’s when you have to use other AIs to help it out of a vicious cycle.

Gemini-2.5 can eat huge amounts of code in the 10MB range, but it has a lot of smaller issues with coding, such as mixing tabs with spaces, turning code inside out, adding thousands of unnecessary comments, not obeying commands (fails even to return a 1:1 version of what you send it). At least in the form it is offered today, I think it is unfit for coding, it can only do suggestions and snippets at best but not “full code” for like a project with just one >1000 lines file, yet alone multiple. Perhaps they can fix it though in the future. But right now it can reason on par or sometimes better than Grok-3 especially if you feed it huge amounts of code for reference. With Grok-3 it seems to choke at 1MB and never forms a reply, however when you stay in the 500kb range and just feed it selected files from your project or others then Grok-3 is awesome.

Since this is all free I don’t use any API to join the AIs in an automated way or things like that. I can’t imagine it would work that well automated.

But I mean this is a free copy&paste pipeline by hand and it is like 10x or 25x superior to the joke that is o3 and o4-mini-high currently on $20 Plus tier, so that’s absolutely ridiculous.

The gap in power is like bronze age vs industrial age.

I just did a project in like 3 days that would take 3 weeks to 3 months with conventional methods, depending on how good of a coder you are and how invested you were. It is absolutely insane. And I only had to code like 3% of it by hand because AI wasn’t smart enough for that part.

2 Likes

Thanks for your detailed answer. Do you know how to ensure privacy and intellectual property for the code sent to Gemini or Grok? Maybe there is a paid version for that. Especially if you want to develop code that could be “sensitive” (think you don’t want it to be open sourced or used by someone else).

I haven’t really looked into it, but I would be very very surprised if any privacy options existed on the high end models. I think it is just not a thing. ChatGPT has the “temporary” feature now, which allegedly excludes the conversation from training data, don’t know if that exists somewhere else.

I have ran into the situation once or twice, where ChatGPT would output fully working and polished code with novel algorithms that are nowhere to be found on the internet. One was a sort of market-research tool, that would match people together by the style of their writing through some sort of Voodoo magic embedding method. For example you could feed it tons of Reddit posts and it would show you which people are similar, think similar things or have like similar cognitive features like intelligence or honesty. That was 100% developed by someone else before and then just transferred into the training data.

I mean IP rights to source code which is a form of free speech are not really a thing anyway in almost all countries. And since you can transmutate copyrighted code now easily with AI into any other form, I don’t think it makes much sense to be concerned about this much anyway.

1 Like

Exactly my point, and that’s the issue, my friend! I’m pissed off. I’m really close to canceling my subscription because this was a terrible setback…

Having exactly the same problems and I’m on the pro subscription, it’s gone from
great to frustrating, looping over the same errors for 10s of iterations.
The reasons for this are either stupidity or greed, or a bit of both perhaps.
If OpenAI make a habit of doing things like this, people will just change provider
to someone they can trust!

OpenAI, I hope your listening, you should bring back o3-mini-high as fast a possible,
otherwise you can forget my $200 and that’s a promise!

1 Like

I totally agree with that. I think I’ll cancel my Plus subscription at the end of the month if a working model as o3-mini-high is not available again.

I found that o3 is not even able to sum up a 10k words text into a 3k words text today. In the best case, it generates a text of 1k words. Even if I told him to generate a 3k words text. 4o also failed to that task.

So, I decided to cancel my Plus subscription. @OpenAi : You lost a customer today.

2 Likes

At around 80-90k tokens i also get this message for o4-mini-high.
“The message you submitted was too long, please reload the conversation and submit something shorter.” What a joke. First they limited gpt4o to about 32k tokens for the first message input, which means it became useless for coding. Now they’ve limited other models. Basically i am lately only using Gemini 2.5 pro and flash. Think about cancelling subscription, it’s useless for me at this point :confused: Guys please fix this issue and fast.

2 Likes

Same issue for me. Please solve this

2 Likes

I was really excited to use o4-mini-high for historical research but it (along with o3) can’t provide factual responses for relatively simple queries (Ex. “What is the historiography of the French Revolution?”). I’m getting fake book titles, fictitious authors, and lazy responses.

OpenAI boasts that these are their most economical models.

But the whole joke is that no one needs their economical models.

Maybe it will be WOW for new users, but not for old ones.

o4-mini-high has really degraded a lot compared to o3-mini-high. I don’t even know what to use o3 for, it can’t even handle basic tasks.

My telegram bot works better than o4-mini-high.

It’s horrible. Explain why the o4-mini-high is needed at all? He can’t handle the code, he can’t handle the logic. He’s bad at everything.

Will I use o4-mini-high for encoding? No. It’s garbage.

1 Like

I subscribed to pay $200 per month to unlimitedly use o3-mini-high for coding tasks! I haven’t sign up because any other model.
And now they removed it?!

It got replaced with compeltely useless o3 and o4-mini-high, which responds with 1/4 of the code that needs to be edited, missing function implementations, variables, it can’t even reason about the errors and how to fix it up. It gives what I want 1% of the time, o3-mini-high gave what I wanted 95% of the time, just compare. New models worth is $0, I can’t function at all, from 10/10 down to 0/10 in productivity.

I’ll definitely cancel subscription if not fixed ASAP. @OpenAI_Support
For now I changed to $20 per month until o3-mini-high restored, -$180 for openai.
I might even think about $0 subscription. Would be -$200 for openai.

2 Likes