Realtime API: workaround for lack of Structured Output?

matthew19 · October 30, 2024, 3:44pm

I’ve implemented a chat-based application that returns Structured Output to the chat client (including a human-readable message). I’d like to create a voice-based version using the Realtime API.

As far as I can see, it doesn’t support Structured Output yet. Is there anyway to get the model to return both audio and some structured data that I can display?

Note that I tried to get this working by using a functional call but had a heck of a time getting the API to call my function seeing as the “parameters” were actually a big array of results generated on the server, with no response (aka function result) from the client expected.

j.wischnat · October 30, 2024, 3:50pm

You’re half of the way there.

I think function calling is a perfect way to go!

You could write your own function to structure the output the way you want to.
As input, you could let the AI decide what to use.

For conversion to the structured output, either use an algorithm or another API-Call to a cheaper API (or even the realtime API if you really want to flex)

Feel free to elaborate further!

matthew19 · October 30, 2024, 4:06pm

The input isn’t really the issue. It seems to work okay even without ‘strict = true’. The problem is more that it refuses to call a function named ‘return_data’, and renaming it to something like ‘get_feedback_on_data’ didn’t seem to help (including various attempts at massaging the function description).

It’s also not clear to me what the client should “return” to the model in this case.

But if this is the only way to go, I’ll see if I can find the magic combination that works.

edwinarbus · October 30, 2024, 4:26pm

Structured outputs is on our list to support in Realtime, but no real timeline to share just yet.

jacobliber · November 23, 2025, 3:47pm

Any update on this?
Thanks!

tleyden · November 23, 2025, 10:17pm

I’ve been able to get the openai realtime api to do pretty advanced function calling, including generating full code snippets. I haven’t needed it to generate JSON yet, but I am sure with the right function definition, it could easily generate JSON, which seems a lot simpler than generating code snippets.

Have you tried defining a function that accepts the structured output (do you mean JSON?), including examples of the output you are expecting in the function definition? It might help if you post some examples of what you tried, what you expected to happen, and what actually happened.

Topic		Replies	Views
Realtime API Audio Modality output API realtime , api-realtime , api-realtime-speech	7	1080	December 13, 2024
Returning fixed length arrays with chat completion API Prompting api	6	2025	December 23, 2023
Using function calling to simulate multi-person conversation API	6	2136	December 16, 2023
Function calls in 3.5-turbo-0613 for compound text processing produce unreliable results API functions	7	1965	December 18, 2023
Handling real-time dynamic complex data with function-calling API function-calling	0	422	May 14, 2024

Realtime API: workaround for lack of Structured Output?

Related topics