Whisper Streaming Strategy

jochenschultz · July 30, 2024, 5:01am

You can do some kind of pre prediction.

I mean in alot of sentences you already know what the person you are talking with
is

going

to

say

pretty

early

in

the sentence

Therefore you may want to try to give the incomplete sentence to a model in the background complete it and get an answer based on that completion.

You may also cache the beginning of an answer as an mp3 file and ask the assistant/tts to let the answer be continued after the beginning - I don’t know if that makes any sense to you.

Also random scenes e.g. something dropped in the background and the bot says stuff like “ooopsie sorry I dropped a knife… not what you are thinking now hahaha. a knife I used to eat on my desk… yeah, I know sometimes I eat at my desk”… which gives the backend enough time to generate

The answer would start with “Anyways, back to your question…” which is streamed as a cached mp3 and assistant was instructed to continue the answer after “Anyways, back to your question”.

Also in support hotlines you will find that many questions are asked over and over again.

Caching the answers in general and even doing some embeddings with either mp3 file location or a function call to create the answer might be working as well

Topic		Replies	Views
ChatGPT API TTS streaming API api	3	4562	January 21, 2025
Send data stream to TTS API API api	2	1870	April 17, 2024
Streaming text in and audio out? API api , tts	3	3842	June 19, 2024
How to reduce Latency for realtime conversation using whisper API	1	1220	June 22, 2024
Multiple API calls - high latency; options / product suggestion API chatgpt	21	3287	December 25, 2023

Whisper Streaming Strategy

Related topics