o1 and o3-mini-high were the only models that could reliably work with code if it is longer than 150-250 lines, in a manner that rivals Grok-3. o3-mini-high was like a less intelligent version of Grok-3 but still capable to work with 1000 lines of code. 4o is still good on the micro-scale, like you can write mockup code “filter x for y with this regex …” and it will so to speak auto-complete or write smaller functions or tell you what is wrong or how to fix error fast. And yeah if things are not too complicated and it stays within the comfort zone of 150 lines possibly you can go beyond that somewhat but not 100% of the time. I mean it is very useful for very fast answers, but it is literally downgraded in capabilities more than one generation, where it can only assist you and not write all code entirely on its own. All the newer models at least on Plus tier they make too many errors and hallucinate too much, they are hardly usable in this state. Better just use Grok-3, it is free and will save you so much time. And o1 was never even better than Grok-3 it was just different. So even accessing it via API for a hefty price hardly makes sense .