Hi all,
On several youtube videos, I got the information that gpt-4.1 series are a bit code oriented, but I had no official truth reference about that.
But we also have other models available, such as gpt-4.5 series, and also o4 series…
So the question would be: are you aware of any reference stating which model is it better to chose if the main focus is code?
Thanks a lot.
- gpt-4.5 is best suited for creative writing, it’s not bad for coding but it’s expensive.
- gpt-4.1 has the largest input token context (1 million), which makes it interesting for complex coding but since it is only available on api you would have to either use the playground or some tool like Windsurf, Github Copilot or Codex CLI
- In the API you can still try using o3-mini-high, which was as good as o3/o4-mini but returns a more complete code (o3 / o4-mini often tells you what changed but will cut your code in
pieces and you will spend extra prompts telling it to give a full code).
I particularly am still using ChatGPT Plus (not the API) for most coding tasks where gpt-4o is alright, leaving o3 / o4-mini for more complex problems, but I don’t rely entirely on AI. I mostly use it as an assistant. So, in the end it depends on what your needs are and expectations.
6 Likes
o1 and o3-mini-high were the only models that could reliably work with code if it is longer than 150-250 lines, in a manner that rivals Grok-3. o3-mini-high was like a less intelligent version of Grok-3 but still capable to work with 1000 lines of code. 4o is still good on the micro-scale, like you can write mockup code “filter x for y with this regex …” and it will so to speak auto-complete or write smaller functions or tell you what is wrong or how to fix error fast. And yeah if things are not too complicated and it stays within the comfort zone of 150 lines possibly you can go beyond that somewhat but not 100% of the time. I mean it is very useful for very fast answers, but it is literally downgraded in capabilities more than one generation, where it can only assist you and not write all code entirely on its own. All the newer models at least on Plus tier they make too many errors and hallucinate too much, they are hardly usable in this state. Better just use Grok-3, it is free and will save you so much time. And o1 was never even better than Grok-3 it was just different. So even accessing it via API for a hefty price hardly makes sense .
1 Like
03 is ok in coding task but it leave the code in between and require a lot of back and forth prompts
I highly recommend 4.1 if you are using over API and “sharing tokens” with openai.
link: https://help.openai.com/en/articles/10306912-sharing-feedback-evaluation-and-fine-tuning-data-and-api-inputs-and-outputs-with-openai
As you’ll see here:
How do I know if I am eligible for the free tokens or if I am enrolled for the free tokens?
You can see if you’re eligible for the offer by going to your data sharing settings page and confirming whether you see the “You’re eligible for free daily usage on traffic shared with OpenAI” offer. When you enable data sharing and you’re qualified for complimentary tokens, you’ll see “You’re enrolled for complimentary daily tokens”.
OpenAI will give 30 days notice before terminating this program.
That’s what I use for coding and the “sharing” makes it very affordable.
Otherwise, I would definitely use a “mini” model…
But I personally find gpt4.1’s coding more preferential than the “full reasoning models” (i.e. o3, etc.), but I’m using it in a large-scale system patching applications across many shared code files, NOT for specific one-shot activities, in which I think perhaps o3 might “excel in general”, but why pay more for it if you can get it cheaper from 4.1? o3 seems to be more terse and direct, which can be nice, but if you get a good implementation checklist going, 4.1 can be just as direct and not include any other fluff.
I find using a custom gpt and using special instructions to ensure it’s optimized for my coding needs