Downsides of o1-preview and o1-mini

I switched to the o1-models for coding. Just because the quality of code is far better than every other model. Using them as a Tier 4 (in continue.dev with an API-Key) now since beginning of November (approx. 4 Weeks).

There were IMHO a few downsides with 4o and the other models out there - especially in terms of number of bugs produced. Logically and also still syntactically.

Both o1 models drive me CRAZY.

  • Even the simplest questions are generating multiple screen pages of full of code and explanations
  • The solutions are mostly very complex but bulletproof.
  • Unfortunately in many cases the solution tent to completely refactor my code - whereas the code of this new project was already partially written by o1.
  • It happens very often that o1 removes features without telling. If you do not see this during review you get stuck in deep problems when hitting this pitfall 4 hours later. Maybe it didnt get the meaning of the stuff deleted.

It also happens (and it happened 5 min ago AGAIN) that o1 removes existing features and tells me. I then complain and tell o1 that I need it - o1 shall put it back in.

Then weirdness and madness start. I get an answer that o1 understood the task. And in the end of the answer it suggests I could optionally remove the feature for the better AND AFTER ALL THE PRODUCED CODE IS LACKING THE OLD FEATURE AGAIN.

1 Like

It seems that all models are using ChatGPT 4 by default, try asking all models and you will see.

Same here, when i implemente new features and ask code from chargpt it removes other features, so for that i need to tell chatgpt that don’t touched other features or code, focus on given feature, and it works but some ir repeats same mistake.

2 Likes

Yes, that’s it. It looses focus and sometimes it’s constantly claiming having found a better solution then you could have ever imagined. :wink: A bit over aligned to super smart solutions from Stack Overflow

Hope this gets better in the next release. In the announcement a system message has been promised. Maybe this will lead to a better following of given tasks.

Addendum:

4o treats some requests with the same ignorance. I am currently refactoring large files and I ask to give my the complete files as result, so that I dont have to “seek&destroy”. and guess what? The result contains lines like

“Ensure additional logic and classes required by the agent are implemented as needed…”

some well hidden. I am really temped to write a small tool to support healing this kind of incomplete answers. It’s an absolute pain. It costs more time than it saves.