Has anyone noticed a regression in o1-preview during the last few days ? Specifically the output is very generic similar to early days of gpt 3.5 for summarization and ideation use cases. For complex reasoning, I find the playground is providing a better response compared to the ChatGPT interface.
for ex: summarize 5 takeaways from responses are pretty basic
Yeah, I did. o1-preview used to work great and was stress-free, but starting a few days ago, I began experiencing instances where I had to repeat the instructions and becoming irritated.
Same here.
It’s very frustrating. It’s taking only 2 or 3 seconds, which is far too little for such a complex prompt request.
Right when I need the model’s capabilities the most. What worries me even more is that I have just over 10 usage caps left, and my renewal won’t happen until December 9th.
I really wish OpenAI would consider allowing an early reset of the usage caps, especially since we’re wasting what little we have on responses that don’t meet expectations.
aha , I am speculating but o1 just got delivered behind a ChatGPT Pro subscription of $200 so the $20 o1-preview regressed - a strategy play akin to Apple iPhones.
@kavitatipnis Yeah… that´s it.
Yes, indeed,
o1 now takes 3-4 seconds to output bad response for the prompt that previous o1-preview used to take 35-40 secs and produce very accurate content (in terms of reasoning capabilities of the models) .
Having the same exact problem. For example, I asked o1-preview a very complex question about what a C program (with complex bit manipulation) was doing to some input data. It provided a thorough response (even though some parts were incorrect) that took 1 minute and 46 seconds to “reason” about. The same prompt fed into the o1 model took only 9 seconds to reason about, and got everything, wrong. The response felt exactly like the output of the normal 4o model.
I think that they kept o1-mini as the low-end model, changed the o1-preview we used to the high end, and created a new, less powerful model to be the middle ground. However, after using it, it seems to be closer to o1-mini’s power.
One thing I have noticed is it began following my custom instructions very heavily and strictly, and doesn’t refuse prompts in a moral gray area.
Yes , I think this change makes it imperative to " think" what are the use cases for higher order reasoning ? General Knowledge Q&A aka search perhaps doesn’t need the $200 o1-reasoning but can do well with 4o-mini however a use case such as a Research Assistant (agentic) that is doing drug discovery needs a higher order reasoning. Meanwhile, software programming in most part is templatized - there is a deterministic way of writing code and very limited “chain of thought” but more of take action ( tools) type of use cases can be done reasonably by the 4o and o1 API’s. For the complex programming use cases which requires “deep” domain knowledge in your instance - C programming perhaps a combination of prompt engineering with examples + fine tuning may yield better results in the short run cost effectively.