Realtime API message response - Audio + Text

bsantos · October 16, 2024, 9:08am

Hi!

Does Realtime API support responses in both Audio and Text? If yes, how to implement it? How do I ask the model to split between Audio and Text?

As an example, if the model message was:
"Several public licenses allow open-source distribution of software while imposing restrictions on its use. Here are some common ones that provide varying levels of control over how the software can be used, modified, and redistributed:

GNU General Public License (GPL)
Use Case: Ensures that software and its derivatives remain open-source.
Restriction: Any derivative work must also be distributed under the same license, meaning if someone modifies your software, they must release their modifications as open-source.
GNU Affero General Public License (AGPL)
Use Case: Specifically designed for networked software (e.g., web apps).
Restriction: Requires that any changes made to the software, even if it’s just used over a network (like in a cloud service), must also be shared as open-source. It prevents proprietary forks used in hosted environments without releasing source code. "

–

This part of the message should be Audio (+ text):
“Several public licenses allow open-source distribution of software while imposing restrictions on its use. Here are some common ones that provide varying levels of control over how the software can be used, modified, and redistributed.”

The remaining part of the message (details) should be Text only. How can this be implemented? Can the model respond with two different message types “audio” and “text”?

Thanks for the help!!

andreas.spaeth · October 17, 2024, 1:51pm

I don’t see a easy way in the moment. You can choose modality [“text”] and you get text messages or you can choose modalities [“audio”, “text”] and you get audio and a transcript(text) for the audio.

You could probably do some modalitity switching and/or functions with specific instructions to get your results. However the API is in beta and has a lot of issues with even simpler tasks.

You probably have to wait until they get some of the existing issues fixed and then you can re-evaluate you use-case and possible solutions again.

bsantos · October 17, 2024, 2:52pm

Yes, I’ve racked my brain over it, tried adding some tags to the text transcripts, but it doesn’t work well. Plus, it’s expensive since you’re paying in real time and just getting text.

But we definitely need this on the API. In a normal conversation, you’d just say, “Yes, here are your options.”

PS: Trust me, I’ve noticed the issues. I spent a whole night fixing my code, only to find out the next day it was the API not responding properly… But still, I’m pretty impressed with the possibilities!

Topic		Replies	Views
Realtime API Audio Modality output API realtime , api-realtime , api-realtime-speech	7	725	December 13, 2024
How to get text only output from the Realtime API? API api , realtime	13	2745	January 1, 2025
How can I pass a system prompt and audio user input to get a text output back? API	15	1062	November 3, 2024
Audio-preview \|\| how to get both audio and text output API	2	469	November 5, 2024
Realtime api never sends audio, only text API realtime	1	546	October 17, 2024

Realtime API message response - Audio + Text

Related topics