Announcing GPT-4o in the API!

Today we announced our new flagship model that can reason across audio, vision, and text in real time—GPT-4o. We are happy to share that it is now available as a text and vision model in the Chat Completions API, Assistants API and Batch API!

It includes:

:brain: High intelligence :brain:

  • GPT-4 Turbo-level performance on text, reasoning, and coding intelligence, while setting new high watermarks on multilingual, audio, and vision capabilities. You can find detailed eval results in our open source simple-evals GitHub repo.

:rocket: 2x faster :rocket:

  • GPT-4o is 2x faster at generating tokens than GPT-4 Turbo.

:money_with_wings: 50% cheaper pricing :money_with_wings:

  • GPT-4o is 50% cheaper than GPT-4 Turbo, across both input tokens ($5 per million) and output tokens ($15 per million).

:chart_with_upwards_trend:5x higher rate limits :chart_with_upwards_trend:

  • GPT-4o will have 5x the rate limits of GPT-4 Turbo—up to 10 million tokens per minute. (We’re ramping up rate limits to this level in the coming weeks for developers with high usage.)

:framed_picture: Improved vision :framed_picture:

  • GPT-4o has improved vision capabilities across the majority of tasks.

:speaking_head: Improved non-English language capabilities :speaking_head:

  • GPT-4o has improved capabilities in non-English languages and uses a new tokenizer which tokenizes non-English text more efficiently than GPT-4 Turbo.

GPT-4o has a 128K context window and has a knowledge cut-off date of October 2023.

Finally, in terms of modalities:

  • GPT-4o in the API supports understanding video (without audio) via vision capabilities. Specifically, videos need to be converted to frames (2-4 frames per second, either sampled uniformly or via a keyframe selection algorithm) to input into the model. Check out the Introduction to GPT-4o cookbook to learn how to use vision to input video content with GPT-4o today.
  • GPT-4o in the API does not yet support audio. We hope to bring this modality to a set of trusted testers in the coming weeks.
  • GPT-4o in the API does not yet support generating images. For that, we still recommend the DALL-E 3 API.

We recommend everyone using GPT-4 or GPT-4 Turbo evaluate switching to GPT-4o! To get started, check out our API documentation or try it out in the Playground (which now supports vision and comparing output across models!)


Thank you for releasing GPT-4.o! We have been eagerly anticipating this update for quite some time. However, I’ve noticed a few issues right away. Firstly, GPT-4.o often misunderstands prompts and incorrectly refers to itself as GPT 4.0.

Additionally, I am unable to find the direct download link for ChatGPT on MacOS.

At AI^Infinity, we use ChatGPT daily and rely on it to deliver the best possible responses. Each update significantly improves the quality of our predictions. Thank you once again, and we are very excited to see what else will be released later this year.

-GPT 69


I cant seem to find anything on the token output limit, had that also increased?


How to use new voice (with emotions) trough the API?


From OP…

GPT-4o in the API does not yet support audio. We hope to bring this modality to a set of trusted testers in the coming weeks.


A 128k context window! :drooling_face: :sob:

Wow it’s so cool.

1 Like

I’m really excited for the things that can be created with 4o! Looking forward to utilizing the API and the rollout of voice support soon. Keep killing it


Is there any information about GPT-4o’s specific limitations and restrictions? I want to try it, but risking something unobvious would danger my main account. So yeah, any information about its limitations and restrictions would be very helpful! My job would require me sooner or later though through my enterprise account.
On another note, why is it named 4o? not GPT 4.5 or GPT 5? Hmm…

am i wrong or it’s the same as gpt-4 turbo?


Your wrong.


(“o” is for “omni”) It is multimodal (accepting text or image inputs and outputting text.


GPT-4o New
Our fastest and most affordable flagship model

Text and image input, text output
128k context length
Input: $5 | Output: $15*

GPT-4 Turbo
Our previous high-intelligence model

Text and image input, text output
128k context length
Input: $10 | Output: $30*

GPT-3.5 Turbo
Our fast, inexpensive model for simple tasks

Text input, text output
16k context length
Input: $0.50 | Output: $1.50*

looks the same length for me. what am i missing? :flushed:

1 Like

Sorry, it seems I missed your reply making your post look very odd.

The model uses a larger token encoding dictionary, meaning the output can be more efficient at producing text, especially in Eastern languages. A Japanese “how to make takuan” response is almost 25% less tokens in the new encoder (and thus fees billed per token).


ow, i think i got it. it seemed like i was talking about the model itself, and not about the size of the tokens. sorry. :flushed:

@_j i see… makes sense, the same token length but more words for those 128k. for my language it is a little bit similar to english but it could improve anyway. nice. :thinking:

1 Like

Even English benefits from an approximate 10% reduction in the number of tokens for a given text.

If you go here and scroll down,

You can see how dramatically the tokenization of some of the lower-resource languages has improved.

The follow-on effects of this in terms of usability in those languages and for translation tasks will be immense.


The above reply is 84 tokens in either token encoder…

1 Like

Hi, your GPT-4o website has shown cased these images embedded with clear text. But you also said GPT-4o cannot be used to generate images.

What am I missing?


That is the capability of the model.

The model is still being red-teamed for safety and alignment, so the image, audio, and video capabilities of the model have not yet been released.

They are coming.