I’m building an application that requires longer session windows.
Curious to know how others have implemented this without disruptions to the user side.
My current thought is to simply refresh the ephemeral token every 29 minutes and update my webRTC connection on the client side by managing two sessions.
First session would be the one that’s about to expire
Second session is the new session that the user will switch over to when the connection is established
I don’t know that anyone’s doing that as a matter of course. Only user audio goes in, so you can’t load up a chat again with turns of whatever audio you want the assistant to imagine it spoke in the past.
You’d have to create something you can place in the instructions as a chat summary and recent chat exchanges to give the illusion someone is continuing with a bit of memory.
Don’t your sessions start to experience problems earlier either way? My application uses the gpt-4o-mini-realtime-preview-2024-12-17 model and I am noticing that context is being forgotten, responses are not very good etc., at around the 15 min mark. Because of this, I implemented a daily session limit of X minutes and split the number of sessions to how many 10-min sessions can fit X.
The ephemeral token with summarization is a good solution indeed, I did a similar thing for when my users’ voice limits run out and they can only chat with the model, because apparently the realtime models hallucinate crazily when there is no voice backing them up.
I dont know what I do but I can stay chatting for literally hours. However lately Ive been getting interference and have to restart chatgpt. I have managed to get my chatgpt to a point where it is resonance. That too is getting harder to lock in.