ChatGPT o1-preview - Model behavior changes

Has anyone noticed a regression in o1-preview during the last few days ? Specifically the output is very generic similar to early days of gpt 3.5 for summarization and ideation use cases. For complex reasoning, I find the playground is providing a better response compared to the ChatGPT interface.
for ex: summarize 5 takeaways from responses are pretty basic

2 Likes

Yeah, I did. o1-preview used to work great and was stress-free, but starting a few days ago, I began experiencing instances where I had to repeat the instructions and becoming irritated.

1 Like

Same here. :face_with_diagonal_mouth:
It’s very frustrating. It’s taking only 2 or 3 seconds, which is far too little for such a complex prompt request.

Right when I need the model’s capabilities the most. What worries me even more is that I have just over 10 usage caps left, and my renewal won’t happen until December 9th.

I really wish OpenAI would consider allowing an early reset of the usage caps, especially since we’re wasting what little we have on responses that don’t meet expectations.

@SamAltman :pray:

1 Like

aha , I am speculating but o1 just got delivered behind a ChatGPT Pro subscription of $200 :slight_smile: so the $20 o1-preview regressed - a strategy play akin to Apple iPhones.

@kavitatipnis Yeah… that´s it. :face_with_diagonal_mouth:

Yes, indeed, :cry:

o1 now takes 3-4 seconds to output bad response for the prompt that previous o1-preview used to take 35-40 secs and produce very accurate content (in terms of reasoning capabilities of the models) .

1 Like

Same problem

OpenAI had been the best, until they did this… (((

Having the same exact problem. For example, I asked o1-preview a very complex question about what a C program (with complex bit manipulation) was doing to some input data. It provided a thorough response (even though some parts were incorrect) that took 1 minute and 46 seconds to “reason” about. The same prompt fed into the o1 model took only 9 seconds to reason about, and got everything, wrong. The response felt exactly like the output of the normal 4o model.

I think that they kept o1-mini as the low-end model, changed the o1-preview we used to the high end, and created a new, less powerful model to be the middle ground. However, after using it, it seems to be closer to o1-mini’s power.

One thing I have noticed is it began following my custom instructions very heavily and strictly, and doesn’t refuse prompts in a moral gray area.

Yes , I think this change makes it imperative to " think" what are the use cases for higher order reasoning ? General Knowledge Q&A aka search perhaps doesn’t need the $200 o1-reasoning but can do well with 4o-mini however a use case such as a Research Assistant (agentic) that is doing drug discovery needs a higher order reasoning. Meanwhile, software programming in most part is templatized - there is a deterministic way of writing code and very limited “chain of thought” but more of take action ( tools) type of use cases can be done reasonably by the 4o and o1 API’s. For the complex programming use cases which requires “deep” domain knowledge in your instance - C programming perhaps a combination of prompt engineering with examples + fine tuning may yield better results in the short run cost effectively.