I’m implementing the voice chat. I need to transcribe the user’s speech first and then use the text (due to implementation requirements).
The app flow is the following:
- the user presses and holds the button
- while holding the button, the user speaks
- the app prints the speech in near real time
- the user releases the button
- magic happens and app speaks the answer
- when the user is ready to ask a new question, they press and hold the button again.
My main question is, should I close a peer connection after each user’s message?
Closing and recreating for each new message brings delays, as it takes some time to request an ephemeral key and connection setup.
On the other hand, if I keep the peer connection open while the user is inactive (does not ask new questions quickly), it may probably charge me continuously. But does it?
Also, if it only charges for tokens, can I somehow “pause” listening so that nothing will be sent to OpenAI for transcribing?
Additional question here: am I correct that an ephemeral key is only used for setting up the peer connection, and then the communication will work even the key is already expired?
Thanks