I’m currently playing with the realtime API. Is it possible to reset the conversation in the middle? The application I’m building requires some specific context where the conversation needs to reset. I tried the function to add or delete a conversation item, but it can not delete any audio clips already sent to the server.
Are you looking to flush the audio buffer that has been most recently streamed before a create is triggered? There’s a method for that.
https://platform.openai.com/docs/api-reference/realtime-client-events/input_audio_buffer/clear
What is it that you want to preserve that isn’t simply starting a new web socket?
A technique that seems promising to cut down expense is to truncate past assistant audio messages in the server, if you don’t need it fully understanding what it replied before.
Thanks for the reply. I’m actually looking into something like starting a new web socket but without the overhead. So what I need is to clear the whole history not just the most recent buffer. I only plan to use the voice → text feature for speed up effectively. My task is not that much a conversation application. You can think of it as more of a bunch of summary tasks for separate cases in a session, with audio as input. Does that make it a bit clearer?
This effectively means a new session, and for that you have to recreate the connection. There is no way to fully reset the session, but why would you want to keep it anyway, if you don’t care about the conversation history?
Say one example is to build an observer of a conversation to label if the conversation is going well in each turn. (Say simply output a “yes” / “no”) But I don’t want the output of this observer to pollute the actual conversation. For this application, assuming the input is audio, I want this observer to also respond fast. In text2text application, I would directly pass the conversation history to this observer each time. Hope that explains the use case better.
If the conversation itself also happens via realtime API, then it sounds like you actually need two connections, and in the “observer” one you would use conversation.item.delete. I might not totally understand what are your exact requirements but hope this helps
Ok. Yes. I think you got the idea. I am planning to open 2. The current issue is conversation.item.delete does not support audios does it? I can only append text content to it and delete later. But it does not apply to audios user already committed. We need more like a reset or being able to delete user committed audios and generated response text.
I believe conversation.item
is not just for text, in the API reference they say that whenever an audio buffer is committed, a new conversation.item
is created. So you could try to keep track of the IDs of those items and delete them whenever you want and it should work
Hmm. Ok. Thanks. Seems like they return a conversation.item.created event when you commit an audio. I’ll give it a test.
In the meantime, just out of curiosity, do you know how the model manages audio messages along with text in the history? I imagine it’d be costly if all audio clips are reprocessed in each turn. Do they transcribe audio as text after each turn?
They store the audio tokens and the model consumes them again at each turn just like the text. Prompt caching has been added to Realtime recently, so all or almost all of those tokens are hitting the cache and getting discounted as per rates defined on the API pricing page
Ok. Just tested today. It worked. All I needed is to record every conv item created and delete all at the end of each turn. Thanks for the detailed explanation!
It seems a simple API like session.reset (or perhaps conversation.delete.all) that cleans up all history and expects to build a new history would be better than expecting clients to maintain their conversation history and submit conversation.item.delete for each item to reset for a fresh conversation.
I understand we can create a new connection to appear as a different session leaving the conversation to be garbage collected.