Up until recently we were using openai direclty with only websocket,
Create the websocket connection
Update the connection with the prompt and voice etc
Lately I’ve faced a case where we get the following error, when this happen my whole prompt is lost and aI starts talking like, Hey whats in your mind. This is a disaster in our case.
invalid_request_error code=cannot_update_voice message=Cannot update a conversation’s voice if assistant audio is present.
This is because of a timing issue where openai successfully start sending audio before the update request goes in, isn’t it (this is what claude code is telling me)
At what point in the process are you calling response.create?
The goal is to separate the immutable setup from any later updates. If this happens in the wrong order, assistant audio may start before the configuration has been updated.
Right after creating the session, send the initial session.update with voice, instructions, audio formats, tools, turn detection, and any other required configuration.
Once you receive session.updated, call response.create. This is where your failure case may originate from.
Essentially, you want to prevent the conversation from starting until the session has received its initial update.
I do exactly that, but for some reason twilio send some audio before i update, this happend 3 times during my last 50 calls i used to test. And this is giong to happen during the demo for sure
Also while creating the session, I’m doing other stuff to pull the right prompt based on the caller data.
I’m thinking of using the http endpoint (that will add 100ms).
Have you been logging the chain of events to make absolutely sure that session.updated with the voice and prompt is always received before sending response.create? Especially in the cases where the model output is a plain assistant reply.