Quick one: I’ve built a voice AI using realtime API from OpenAI and doing prompt chaining with a kind of multi agent workflow to handle up to 30 min conversations where the AI has to run a qualitative interview.
That means driving the conversation, not just responding like a helpful assistant. I can feel the momentum of the RL process and feel like arm wrestling the AI with my prompt. The key challenge I have now is that the AI seem to decide to close off the conversation around 10-15’ mark. Although it’s prompted otherwise.
Any tips on long session management, multi topic conversation and maintaining progression? I’ve not implemented RAG (wouldn’t see the use now?) neither function calling (e.g. telling the AI it should regularly retrieve the interview state for example). Any thoughts, tips or comments?
Perhaps you can define what “close off the conversation” means to you. The AI is saying “thanks, but bye for now!”? Or an error.
The maximum length of a realtime session from when you open it is 30 minutes, increased from 15.
However, the length of the model input context, the consumption of the context window and lack of internal management, and when it is exceeded context window is more opaque. You have tools to clear past messages, but you don’t really know when you should.
It seems you want to maintain just about everything of a conversation, to obtain a final result. Filling the AI with 30 minutes of tokenized audio. That works against a best practice of managing cost and maintaining high-quality instruction following, which is relevant to any large-context model and its attention limits: keeping the recurring input brief, only what’s necessary for the illusion of memory.
You can look into conversation.item.truncate and conversation.item.delete, and see if you application can tolerate intentional “clearing” of obsolete chat.
Perhaps the symptom will be reduced if the AI isn’t “listening” to 20+ minutes every time it responds.
Thanks so much. Indeed I meant the AI is saying something like « thank you for your time that is all for today ». You are right that I a quite in the dark about the audio context window, unsure if I hit limits or things start to fall out of it. I’m certainly pushing the model to it’s memory limit.
I have already different call to other API endpoints to summarize the conversation and key theme and I could try to truncate / delete anything older than 10 minutes and just keep the summary.
I do reinject regularly baseline instructions alongside updated steering - but those are text and only maybe 150 token long, so I suppose the audio token make up the vast majority of the context window.
I don’t know if it’s best practice to reinject all instructions or simply a delta. Not at every turn of course but regular triggers based on topic management and derived conversation metrics.
What would do advise are the 1 or 2 things I start trying to do?
Hey, I’ve got the mechanism to identify and delete conversation items using conversation.item.delete, but I’m wondering what is the best pratice if I want to delete say “all conversation items that are before a certain point in time”. I have the point in time, just wondering if there’s a way, using WebRTC to batch delete, or delete multiple message (e.g. maybe 20 at a time) without overwhelming the server?
Just stumbled upon this issue again, if possible can anyone give clarity on why this is happening? Any help is appreciated. it’s completely random.
Here is the error:
2025-02-18 18:38:17,302 - ERROR livekit.plugins.openai.realtime - OpenAI S2S error {'type': 'error', 'event_id': 'event_B2Hcfhd3qZdmO9sMwEf1h', 'error': {'type': 'invalid_request_error', 'code': 'session_expired', 'message': 'Your session hit the maximum duration of 30 minutes.', 'param': None, 'event_id': None}} {"session_id": "sess_B2H9dDhDkVdZ5d5NRqe34", "pid": 74633, "job_id": "AJ_QYjknpQEHY4o"}
2025-02-18 18:38:17,303 - ERROR livekit.plugins.openai.realtime - Error in _recv_task
Traceback (most recent call last):
File "/home/jeel/workspace/fortitude/fortitude-agent-service/venv/lib/python3.13/site-packages/livekit/agents/utils/log.py", line 16, in async_fn_logs
return await fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/jeel/workspace/fortitude/fortitude-agent-service/venv/lib/python3.13/site-packages/livekit/plugins/openai/realtime/realtime_model.py", line 1103, in _recv_task
raise Exception("OpenAI S2S connection closed unexpectedly")
Exception: OpenAI S2S connection closed unexpectedly {"pid": 74633, "job_id": "AJ_QYjknpQEHY4o"}
That’s not a 525 HTTP error, somebody wrote out S 2 S at that point in code, but there is no explanation of why that code point could have been reached. You’d have to work backwards and see if an API call returned an unreported error message that was captured - or run code you wrote or understand.
Has anybody managed to actually have the realtimeAPI handle >20 min conversations?
My model and orchestration gets the AI to close the interviews it’s running at about 5, 10 or 15 min most of the time. Rarely going up to 20 or 25 min.
It seems to made up its mind on its own and closing the conversation rather randomly. Couldn’t identify any specific patterns in the transcript data.
I’m using different background agent to broadcast instructions to the front end WebRTC conversation agent (realtime) based on transcript and other factors such as topics, time limits etc. I use session.update and websocket broadcasting from backend to front end
No matter how much I prompt the model to not close things of until explicitly asked, it does so.
I also truncate the model (server side) conversation context every 4-6 min programmatically on time/topic triggers successfully and reinject at this time a summary of the conversation so far. So it’s not closing because it’s running out of space.
I think this has to do with the underlying training data being helpful assistant. It seems hard for the model to keep at it when it senses that the user doesn’t engage much or provide short answers. But I can’t obviously do fine tuning based on preferences to realtimeAPi (that would be great) and don’t have the dataset for standard fine tuning neither.
Any bright ideas?
Would love some help here, been struggling with this and this is the last issue to solve before I can properly pilot.
I’m even considering switching out the realtime API to another pipeline of STT > inference > STT etc.
I have exactly same issue as yours. It’s completely random some time the connection drops at 23rd minute, sometime at 2nd minute too.
So it looks to me that maybe openAI drops the websocket connection due to some server constraints at their side.
But then in my case i’m using livekit and realtime api multimodal approach. In their livekit python agent sdk they are logging the error that “OpenAI S2S connection closed unexpectedly”. Nothing else.
I also indentified from library which function it’s throwing. but then there is no way to re-connect to openAI!!
This issue has been causing me problem since a week now.
I have also been facing the same issue over the last 5 or 6 days. OpenAI websocket randomly disconnects, with errorcode 104, ‘connection reset by peer’ error.
I am not able to see any pattern of when it happens.
Even just muting the mic and not doing any conversation leads to a socket disconnect in 10-15 minutes. This is not because of the 30 minute time limit, this happens before that.