Streaming text in and audio out?



I’m curious if there are some possibilities to stream in text from a text model like gpt-3.5 directly into the tts endpoint and stream the response as an output.

Even though streaming the audio output is possible, waiting for the entire text to finish before generating the audio stream results in too much latency.

Welcome to the forum Simon!

I didn’t do this myself, but a friend who did, told me he used async text generation and was sending full chunks of text (like sentences) to text2speech (rather ather than waiting for the whole text to generate).


Ah, of course! Thats a good idea, but hopefully the implement text and speech generation into a single endpoint.

