ChatGPT o1-preview - Model behavior changes

kavitatipnis · December 4, 2024, 9:30pm

Has anyone noticed a regression in o1-preview during the last few days ? Specifically the output is very generic similar to early days of gpt 3.5 for summarization and ideation use cases. For complex reasoning, I find the playground is providing a better response compared to the ChatGPT interface.
for ex: summarize 5 takeaways from responses are pretty basic

marcin.sala · December 5, 2024, 7:32am

Yeah, I did. o1-preview used to work great and was stress-free, but starting a few days ago, I began experiencing instances where I had to repeat the instructions and becoming irritated.

rodrigo.abib89 · December 5, 2024, 5:22pm

Same here.
It’s very frustrating. It’s taking only 2 or 3 seconds, which is far too little for such a complex prompt request.

Right when I need the model’s capabilities the most. What worries me even more is that I have just over 10 usage caps left, and my renewal won’t happen until December 9th.

I really wish OpenAI would consider allowing an early reset of the usage caps, especially since we’re wasting what little we have on responses that don’t meet expectations.

@SamAltman

kavitatipnis · December 5, 2024, 7:31pm

aha , I am speculating but o1 just got delivered behind a ChatGPT Pro subscription of $200 so the $20 o1-preview regressed - a strategy play akin to Apple iPhones.

rodrigo.abib89 · December 5, 2024, 11:47pm

@kavitatipnis Yeah… that´s it.

firdavs · December 8, 2024, 3:08pm

Yes, indeed,

o1 now takes 3-4 seconds to output bad response for the prompt that previous o1-preview used to take 35-40 secs and produce very accurate content (in terms of reasoning capabilities of the models) .

firdavsbek · December 8, 2024, 3:28pm

Same problem

OpenAI had been the best, until they did this… (((

BlueFalconHD · December 10, 2024, 1:43am

Having the same exact problem. For example, I asked o1-preview a very complex question about what a C program (with complex bit manipulation) was doing to some input data. It provided a thorough response (even though some parts were incorrect) that took 1 minute and 46 seconds to “reason” about. The same prompt fed into the o1 model took only 9 seconds to reason about, and got everything, wrong. The response felt exactly like the output of the normal 4o model.

I think that they kept o1-mini as the low-end model, changed the o1-preview we used to the high end, and created a new, less powerful model to be the middle ground. However, after using it, it seems to be closer to o1-mini’s power.

One thing I have noticed is it began following my custom instructions very heavily and strictly, and doesn’t refuse prompts in a moral gray area.

kavitatipnis · December 11, 2024, 10:14pm

Yes , I think this change makes it imperative to " think" what are the use cases for higher order reasoning ? General Knowledge Q&A aka search perhaps doesn’t need the $200 o1-reasoning but can do well with 4o-mini however a use case such as a Research Assistant (agentic) that is doing drug discovery needs a higher order reasoning. Meanwhile, software programming in most part is templatized - there is a deterministic way of writing code and very limited “chain of thought” but more of take action ( tools) type of use cases can be done reasonably by the 4o and o1 API’s. For the complex programming use cases which requires “deep” domain knowledge in your instance - C programming perhaps a combination of prompt engineering with examples + fine tuning may yield better results in the short run cost effectively.

Topic		Replies	Views
Has the reasoning ability of the GPT 3.5 API dropped recently? API chatgpt , api	9	1077	December 25, 2023
Hallucinations and headaches using GPT-5 in production Feedback gpt-5	20	11034	August 10, 2025
o3-Pro Best-Practices and Best-Uses? Seeking Insights Community o3-pro	12	2164	June 29, 2025
Has regular gpt-4 model changed for the worse by any chance? Community gpt-4 , hallucinations	12	1852	April 23, 2025
Chat GPT 4 getting worse? API	8	5734	December 17, 2023

ChatGPT o1-preview - Model behavior changes

Yes, indeed,

Same problem

Related topics