What will be the final/full released capabilities of GPT-4o in the API?

landongarrison · May 27, 2024, 10:19pm

Hi OpenAI team and community,

In the blog, OpenAI mentions that GPT-4o is capable of taking in text or other inputs and producing text, image and audio natively within the model. Will these capabilities be released within the API eventually? Specifically, would you be able to do things like:

Input text and output an image
Input text and get spoken audio, akin to text to speech but more advanced (say outputting spoken audio of multiple speakers)
Prompt in an image and output audio (either general or spoken audio)

If it was yes to all three of the examples, this would be an incredibly enticing product to use and I would begin to brainstorm ideas around this. However, if it is going to stay as a audio in audio out (voice chat) and what we currently have, that would also be great to know so we do not waste time waiting for something that is not coming.

Appreciate the answer!

Topic		Replies	Views
Enabling Audio Access for GPT-4o via API API gpt-4	0	342	September 5, 2024
Video conversation gpt-4 conversion API API gpt-4	1	391	August 1, 2024
True multimodal gpt4-omni from OpenAI's May Release, when and what? Community gpt-4 , chatgpt , assistants-api	4	660	July 30, 2024
Speech-to-Speech (Audio Input/Output) with 4o API	5	1114	October 13, 2024
When will audio to audio be released for gpt-4o please? API gpt-4o	8	4706	July 2, 2024

What will be the final/full released capabilities of GPT-4o in the API?

Related topics