Realtime API Pricing: VAD and Token Accumulation - A KILLER

Well done and kudos to you for sharing it here. Just to clarify, you’re deleting the audio input and output events and once transcription is done, you are passing the input/output transcripts to the assistant for summrization of that interaction?

How is that affecting latency? Are you seeing a significant delay?

I implemented a “keep_last” rule the other day to keep last n number of audio inputs to maintain context (to avoid latency with assistant interjections). At n=3, the cost reduction is around 15%. At n=5, the reduction is negligible. So the keep_last concept doesn’t seem to be very effective.

If your method isn’t impacting latency that much, then I’d be keen to experiment a little.