o1 seems very much faster than o1-preview, but imo it’s so blatantly obviously NOT a good as o1-preview.
Anyone else seeing this?
(This post is about paid ChatGPT, not the API)
This seems to be a common theme in the last year or so, and not only for OpenAI: Showcase something powerful, but then deploy a scaled down version that is much less capable.
o1’s capability to interpret key details in the request (prompt) are ignored, to serve something that is more generic. A “Turbo” version of o1-preview.
Example 1 - Chain of thought gone?
The thought process method seems either totally removed, or simplified to a point where it’s not really helpful. o1-preview would self-evaluate, self-itterate, self-improve to a degree where it was actually useful. I dont see that in o1.
In o1-preview, the chanied self-probmpting was noticable (not because it wrote what it did, but because the final result was actually brutally much beter than that of 4o.)
Example 2 - Explaining by example is not understood
In o1-preview, you could include examples from one industry to demonstrate your idea/goal and then request a result for a different industry but with same principles. (This was a helpful way to find actionable methodologies to apply tp your line of business but with borrowed ideas from another.)
In o1 this does work at all, and o1 is unable to interpret the goal of the prompt. Instead, it tends to expand the examples further by making them more detailed, instead of interpreting the principles that the examples represent.
Where o1-preview was clearly a multi-step agent creating actionable plan with multiple self-prompt, o1 seems more as if it’s simplifying the input prompt first before running it (removing important details in the process.)
Example 3 - Constructively disagreableness gone?
o1 preview was quite capable to reject bad ideas. For example, if you strongly suggested bad practices or ideas (in coding, schema, business methodolodies) then o1 preview would strongly, and insistantly disagree and present better options instead.
o1 is yet again agreable, just as in example #2.
The overall impression is that o1 is ignoring instructions, and just rants on. Word count of quality.
I noticed a similar degradation of quality in 4o about a month ago. Suddenly it became weirdly bad for programming, started to ignore specific instructions.
I use ChatGPT for very specific, repetitive tasks. If something suddenly works substantially worse, I notice fast.
A common statement is “You are just less impressed with time, and that makes you think is getting worse”. I dont think so, because these degradations are sudden, and for identical prompts.
I see no such sudden changes in the API when using specific model builds.
I’m not here to preach. I am honestly curious if others get the same impression as I do.