Realtime API Pricing: VAD and Token Accumulation - A KILLER

liquidshadowsmk · October 21, 2024, 6:54am

Well done and kudos to you for sharing it here. Just to clarify, you’re deleting the audio input and output events and once transcription is done, you are passing the input/output transcripts to the assistant for summrization of that interaction?

How is that affecting latency? Are you seeing a significant delay?

I implemented a “keep_last” rule the other day to keep last n number of audio inputs to maintain context (to avoid latency with assistant interjections). At n=3, the cost reduction is around 15%. At n=5, the reduction is negligible. So the keep_last concept doesn’t seem to be very effective.

If your method isn’t impacting latency that much, then I’d be keen to experiment a little.

Topic		Replies	Views
Realtime API extremely expensive Feedback realtime	66	6500	December 4, 2024
Realtime API pricing is wrong, will overcharge API realtime	36	3331	January 15, 2025
I don't understand the pricing for the realtime API API realtime	33	13121	October 8, 2024
New Realtime API voices and cache pricing Announcements realtime , prompt-caching	26	7345	November 27, 2024
Assistants API pricing details per message API api-billing	68	40656	January 29, 2024

Realtime API Pricing: VAD and Token Accumulation - A KILLER

Related topics