I’m experiencing an issue with the OpenAI Realtime API where it sometimes only returns text responses without audio, but only when continuing a conversation with context.
In a fresh session, both text and audio responses work fine.
When I provide context from a previous interaction, the API sometimes stops sending audio.
I am getting something similar. When wanting to recover a previous session, I atttempt to load a few of the previous conversation items (transcripts from the user and assistant messages) before starting to talk. It mostly works with one previous message, but if I attempt to load more than that, I don’t get audio responses. However it is processing my audio and responding to them in the text mode only. It is almost like there is some circuit breaker at the OpenAI end and something has tripped out
I’m getting the same bug. It seems to only happen when I populate the conversation with converation.item.create items with text (Note: my session.update modalities are always audio and text). Unfortunately this means I can’t really provide context to the model for realtime sessions. I’m seeing this with 4o-mini-realtime on Azure.
Edit: it looks like this is a bug acknowledged by OpenAI since December: see issue titled “Trouble Loading Previous Messages with Realtime API”