This week's launches: o3, o4-mini, GPT-4.1, and Codex CLI

Our most powerful reasoning models

o3 and o4-mini are now available in the API. o3 achieves leading performance on coding, math, science, and vision—it tops the SWE-Bench Verified leaderboard with a score of 69.1%, making it the best model for agentic coding tasks. o4-mini is our faster, cost-efficient reasoning model.

While they’re available in both the Chat Completions and Responses APIs, for the richest experience, we recommend the Responses API. It supports reasoning summaries—the model’s thoughts stream while you wait for the final response—and enables smarter tool use by preserving the model’s prior reasoning between calls.

o4-mini is available to developers on tiers 1–5, and o3 is available to developers on tiers 4–5. Developers on tiers 1–3 can gain access to o3 by verifying their organizations. Reasoning summaries and streaming also require verification.

For these models, we’re introducing Flex processing—significantly cheaper per-token prices for longer response times and lower availability. Flex processing helps you optimize costs even further when using these models on non-urgent workloads such as background agents, evals, or data pipelines.

Developer-first models: GPT-4.1

We launched GPT-4.1, GPT-4.1 mini, and GPT-4.1 nano in the API, trained for developer use-cases related to coding, instruction following, and function calling. They also have larger context windows—supporting up to 1 million tokens of context—and are able to better use that context with improved long-context comprehension.

Codex CLI

Meet Codex CLI—an open-source local coding agent that turns natural language into working code. Tell Codex CLI what to build, fix, or explain, then watch it bring your ideas to life. Codex CLI works with all OpenAI models, including o3, o4-mini, and GPT–4.1. Watch the demo.

20 Likes

Variance in 1st token time and total 256 token response time across 100 trials.

Note that each graph is independently scaled, and scaling to 12 bins with non-zero values.

gpt-4.1-mini time to first token

gpt-4.1-nano time to first token

gpt-4.1-mini total time to 256 tokens

gpt-4.1-nano total time to 256 tokens

No retries tolerated, all success. Cache-breaking patterns with 1800 tokens in. Launch rate: 600 RPM

Metric Average Minimum Maximum
mini total TPS 94.1 26.8 128.6
nano total TPS 153.1 26.1 242.7

nano: 256 tokens in under two seconds in over 60%.

1 Like

Than you for all the goodies!
I was wondering if there is any plan for releasing an improved realtime webRTC API? or a realtime multimodal api that has text/ audio/ video?

3 Likes

Hi @edwinarbus !

I wanted to share some feedback regarding the newer model (03).

I have noticed that when I ask 03 to complete tasks with specific instructions, for example, writing an article with a minimum of 3000 words, it often produces responses that are significantly shorter (sometimes less than 1000 words). In comparison, the 01 model was much more consistent at meeting these kinds of requirements.

I also tested the same prompts with 4.5 and noticed that some instructions are often ignored or underdelivered.

I completely understand that each model evolves in different ways, but I wanted to flag this experience in case it is helpful for future tuning. The ability to strictly follow detailed instructions was one of the major strengths of earlier versions.

Thank you for all the hard work you are doing. I just wanted to provide this feedback.

10 Likes

o4-mini-high loses chunks of code mid‑conversation, and when I ask, “Where’s the rest?” it replies, “Sorry, I forgot to include those parts,” yet still provides incorrect information.

o4-mini can even lose the thread of the dialogue (just like o3). It abruptly switches back to topics we discussed an hour or two ago. Sometimes it feels like it’s talking to someone else—and definitely not to me.

Compared to what we had, this is awful. I was thrilled when I first purchased a Plus subscription. Now I don’t even know what to think. It’s disgusting.

OpenAI, I apologize for the criticism, but these three new models have really infuriated me today.

No and NO

Bring back the old models!

7 Likes

I wonder who it is talking to if it isn’t you. Why do you get that impression? Tell us more please.

I send her the documents and ask her to analyze them. How I got the result is another question.
We talk and talk, and then she seems to start glitching and continuing about the documents that I sent her an hour ago. I write to her, “You and I are talking about another topic,” to which she does NOT LISTEN to me and continues to write about the documents.

1 Like

That’s really interesting. Did you ask her why she did it? Sort of as a self-diagnostic.

Yes, I did. She doesn’t seem to hear me and keeps writing about document analysis.
She ignores me.

1 Like

What is her reply to why she she is deviating from your instructions? It kind of got to ask it like you’re talking to a person…

I already mentioned that she doesn’t hear or listen to me at all.
It could be a bug — sometimes I had to either delete the entire thread or edit an old message just to break out of the loop.

This happened with both o4-mini and possibly o3 (I’m currently rolled back to the o3 version, so I’m still testing that).

But honestly, that’s not the biggest issue. What really frustrates me is that o3-mini-high got replaced with o4-mini-high.
o4-mini-high feels like a step backward. It literally loses chunks of code. I send a 1000-line script and ask for help preserving the structure and logic — in return I get just 200 lines. I ask, “Where’s the rest?” and it replies, “Sorry, I forgot,” and then gives me either 100 or 300 lines — but still incomplete.

And you aren’t getting any responses from open AI or the dev community on this? I can’t say I’ve come across this problem myself but maybe somebody else out there has. Until then I would keep questioning the model to see if any of the instructions it’s following her potentially unclear or contradictory

I’ve seen a lot of people complaining about the new versions. They believe that they are worse than the previous ones.

(I’m talking about an update that came out a few days ago)

I hope that OpenAI will do something about it. I have already written in support.
They said they would take it into consideration.

1 Like

They said they would just take it under consideration? Did they give you any indication as to why you’re getting the results you’re getting?

I wrote about the o4-mini-high because I didn’t know about the problem with other new models yet.

That’s what they told me -

" Hi there, Thank you for reaching out to us with your concerns about the o4-mini-high model. We understand how crucial it is for our tools to seamlessly integrate into your workflow, especially when it comes to programming tasks. Your feedback is invaluable in helping us improve our models and services. Regarding the issues you’ve encountered with the o4-mini-high model losing parts of code and context, we’re sorry to hear that this has been affecting your productivity. We continuously work on enhancing our models’ performance and accuracy, and specific feedback like yours is essential for this process. As for your request to regain access to the o3-mini-high model, we appreciate your input on its performance and suitability for your needs. While I cannot directly address changes to model availability in this response, I assure you that your feedback will be forwarded to our product development team for consideration. In the meantime, if you haven’t already, I recommend experimenting with different prompts or adjusting the level of detail in your requests when using the o4-mini-high model. Sometimes, slight modifications in how a task is presented to the model can significantly impact its ability to maintain context and generate the desired output. Additionally, please stay tuned to our Model Release Notes for any updates on model improvements or the reintroduction of previous models like o3-mini-high. We are committed to delivering the best possible experience to our users and will consider all feedback as we plan future updates. Thank you again for your feedback and for being a ChatGPT Plus user. We’re here to support you, so if you have any more questions or need further assistance, please don’t hesitate to reach out. Best,
OpenAI Team"

What did you glean from all that? Because it seems to me what they’re indicating is that you need to be very precise in your instructions and the directions that you’re giving to the assistant because otherwise it’ll start deviating basically you have to tell it exactly what’s on your mind word for word all the detail that you can muster. At least that’s what I try to do.

And if at first you don’t succeed try try again. The real trick to use in this thing is to not get a perfect result the first time but they take the first result have the AI review its work you review its work you ask it what it thinks of its work and then you give it the input that you noticed about its output. It’s a back and forth it can’t just be a one-way conversation. You have to guide this thing like you would almost a child You’re trying to teach it what you’re thinking.

Okay, I’ll try.

I hope that OpenAI will hear us.

1 Like

Until then keep at it and go back through your work just to be sure it’s EXACTLY what you are meaning to say. Otherwise it will guess. Precision and patients seem to work for me.