I have used O3 quite intensively since it was released and with all “best” models of OpenAI I’ve had a bit of a hype followed by a lot of frustration.
O3 is different in that regard, it is capable to provide very well designed code. This was my original reason of switching to it from the O3 and O4 mini-high models. I used it to beautify larger sections of code.
O3 has one significant flaw so far, it starts to seriously suck once the code reaches about 700 lines and that suck increases with linecount.
At around 800 lines O3 starts to rebell against the length, insist in only providing small patches. When small patches are given they can be a nightmare to integrate and more often they break the code.
O3 starts to remove significant code secretly to make it shorter, no matter what prompt or system prompt you use, O3 will get hostile regarding to code documentation.
I’ve tried to guard a small documentation area in my code with warning comments, with system messages and with prompt alerts .. all to no avail .. O3 will remove it.
It became more and more usual to ask it to provide a full functional return without destroying functionality 4-5 times in a row until it finally provides a working response.
O3 low is surprisingly capable too.
Overall it is a very good model, much better than 4.1 in coding but it struggles with large responses. 4.1 is better with large responses but it will fill that response with errors, so the code is more likely to be broken afterward. O3 rarely did that.
O3 is my current default model for development tasks that require some skills.
The “mini” models became useless to me.
4.1 is my default model for very simple and quick tasks, like applying a patch of low complexity.