Announcing GPT-4o in the API!

Team, exciting times but just a suggestion from a long time fanboy. When the star of your announcement show is the chat interface, perhaps state that it’s coming but not quite here yet. Maybe even give an ETA. I get it that Google was on your heels (no need to confirm that, we all know) and that’s understandably promoted the early announcement, but just tell us to save us running around reading every thread and blog to find the answer. Gpt4o is exciting and we all wish you the best for its success. xo

They did say it several times during the 30 minute event and it’s stated again at least a few times on the announcement page.

I’m not sure what more they could do.

We recognize that GPT-4o’s audio modalities present a variety of novel risks. Today we are publicly releasing text and image inputs and text outputs. Over the upcoming weeks and months, we’ll be working on the technical infrastructure, usability via post-training, and safety necessary to release the other modalities. For example, at launch, audio outputs will be limited to a selection of preset voices and will abide by our existing safety policies. We will share further details addressing the full range of GPT-4o’s modalities in the forthcoming system card.


I appreciate your guidance… its helpful. So with GPT4 Turbo, my prompt is 750 tokens; input is 2000 tokens, but Im getting only around 1000 tokens of output. Shoudn’t I have around 3346 tokens avialable for output? (I am getting only 1/3 of that even when we have the limit set to the max of 4096 tokens.) The math isn’t math-ing for me. 2k tokens of outputs for this prompt. Thanks again!

In your case, you would have the full 4096 tokens available for output. Prompt plus context that you supply are both considered input tokens.

As said, it really depends on how what task you are asking the model to complete and how you phrase your prompt. Sometime a single word or sentence can make a significant impact on the number of output tokens.

I re-emphasize this again: just because you supply a large number of input tokens does not automatically mean you’ll get a large number of output tokens.

If you can share your actual prompt, we can see what might be causing it. But without that information, it is difficult to provide pointers on how might be able to increase your output.

1 Like

Yes, you can access it if you’re selected as one of their ‘internal’ users. So yeah, :slight_smile:
Unfortunately, the NDA is blocking the information from spreading or facing the legal consequences. So yeah, good luck. The newest GPT-4o is awesome though not everyone can taste its full capabilities. For normal folks and even power users, they are “throttled”. You need to be their insiders to access.

I am hoping to pull this into some vr apps when I get time to learn the api. Giving the ability to learn on music would also be supercool.

I’m not sure what your point is.

These “insiders” you’re taking about are given access specifically to test the model for OpenAI.

What about GPT 4o limitation for free users? any idea?!

1 Like

The limitation is there is no free use of the API, which is this topic’s subject.

In ChatGPT, there may be some free use of GPT-4o in the future, maybe 16 uses per 3 hours.

I am impressed by the remarkable capabilities demonstrated so far. If there are any new developments or advancements in the real-time chat functionality, I would greatly appreciate it if you could keep me informed.

I understand the Omni model API does not support audio at the moment. However, is there any way to access the live language translation as shown in one of your videos?

That would involve using the audio features, so… No.

The limitations for GPT4o free users include the cap on messages per 3 hours and that your conversations train the model. Even Plus users “train the model.” Teams’ conversations and interactions through the API do not.

So _J’s right, he just doesn’t go far enough. Nothing is free, and you pay for your use either with money or with your delicious, delicious input.

Below is one of the prompts (actually one of my shortest). The input test used (i.e. the corpus of text to be assessed] is around 1200 tokens and I have the setting to max lenght = 4096. Yet, I am only getting 200 tokens output. The temp setting is between 0.1 - 0.4. Here is the prompt:

"Task: Assess the narrative of a GCF concept note against the Fund’s investment criteria.
Narrative for Evaluation: [Insert Context and Baseline narrative here.]

Assessment Criteria:

Project Intent: Does the narrative explicitly state the project’s intent to reduce climate vulnerability?

Section 1: Geographic, climatic and socio-economic context:

  • Current conditions in the project areas with supporting data and statistics.
  • Population demographics and density in the project areas.
  • Trends in socio-economic indicators over the past decades (e.g., income levels, employment rates).
  • If applicable, environmental and ecological characteristics, including biodiversity and natural resources.

Section 2: Observed historical data:

  • Current conditions in the project areas with supporting data and statistics.
  • Historical climate data, including temperature, precipitation, and extreme weather events.
  • Historical land use and land cover changes in the project region.

Section 3: Climate change projections:

  • Demonstrate that climate change is the primary driver of adverse impacts.
  • Detail current and projected climate impacts like temperature changes and extreme weather events.
  • Expected shifts in agricultural zones and impacts on food security.


  1. Format your assessment in the way indicated below and do not provide recommendations:
    [Insert overall assessment of the narrative here with a score on a Scale of 1 - 10]

  2. Be sure to indicate whether or not a criterion is satisfied because it is explicitly addressed or because the narrative implies that it is addressed.

  3. Format the remainder of your assessment in the way indicated below:

Section 1: Geographic, climatic and socio-economic context:
Missing criteria:
+ [Insert missing criteria 1 here]
+ [Insert missing criteria 2 here]
+ …
Note: Generate the list of missing criteria for the subsequent sections in exactly the same format.

  1. End with an over all score of the narrative provided on a scale of 1 -10 based on how much the narrative satisfies all the criteria provided prompt. IF all criteria are met, then congratulate the user for the high quality narrative.

  2. Do not include any superfluous punctuation marks such as Asterix “*” and “+” signs."

What am I doing wrong? Thanks!

Your prompt currently does not include any wording that you suggests that you are looking for a detailed response. Depending on where exactly you’d like to add detail in the assessment, you need to explicitly state that in your prompt.

For example:

  1. Be sure to indicate whether or not a criterion is satisfied because it is explicitly addressed or because the narrative implies that it is addressed.

If you’d like more details, you’d have to write something along the lines of:

  1. Be sure to provide a detailed justification including supporting evidence whether or not a criterion is satisfied because it is explicitly addressed or because the narrative implies that it is addressed.

What additionally may help is to include explicit examples of what you would like the output to look like in terms of length and style.

Personally, I don’t think that with this type of task - in the way it is currently structured - you’ll be able to generate significantly more than 1,000 tokens. output.

1 Like

We have a similar process for the project we’re working on.
I also put my hands up for this!

Did you read the post? They said that now only vision is available by splitting a video to frames. Other capabilities will be rolled out later, including audio / image multimodality.

Anyone had any success with gpt-4o VISION in ASSISTANTS?

Using the latest (Python) api 1.30.1 I can upload files and I can mark them as ‘vision’ - but in the Assistant API for tools I can only pick file_seach and code_completion (and I must pick either) .
When uploading images with codecompletion it is hit or miss if it actually creates code to read the image. So it seems we’re getting CLOSE but not there yet ? (for Assistants)
Otherwise pretty impressed with the gpt-4o speed in Assistants.

1 Like

Images for computer vision are part of messages.

Follow this link, and expand the “content” of a user message, to see how to construct a message that refers either to a file ID uploaded or a URL.

I really am looking forward to GPT-4 Omni! :smiling_face_with_three_hearts: