I’ve implemented a chat-based application that returns Structured Output to the chat client (including a human-readable message). I’d like to create a voice-based version using the Realtime API.
As far as I can see, it doesn’t support Structured Output yet. Is there anyway to get the model to return both audio and some structured data that I can display?
Note that I tried to get this working by using a functional call but had a heck of a time getting the API to call my function seeing as the “parameters” were actually a big array of results generated on the server, with no response (aka function result) from the client expected.
1 Like
You’re half of the way there.
I think function calling is a perfect way to go!
You could write your own function to structure the output the way you want to.
As input, you could let the AI decide what to use.
For conversion to the structured output, either use an algorithm or another API-Call to a cheaper API (or even the realtime API if you really want to flex)
Feel free to elaborate further!
The input isn’t really the issue. It seems to work okay even without ‘strict = true’. The problem is more that it refuses to call a function named ‘return_data’, and renaming it to something like ‘get_feedback_on_data’ didn’t seem to help (including various attempts at massaging the function description).
It’s also not clear to me what the client should “return” to the model in this case.
But if this is the only way to go, I’ll see if I can find the magic combination that works.
Structured outputs is on our list to support in Realtime, but no real timeline to share just yet.
5 Likes