ChatCompletion stream to tts

mariaclara.agent.ai · March 12, 2024, 9:22pm

Hi,

I am implementing a voice Ai where I want to generate a speech (via openai TTS) from the openai chatcompletion.
I realized that waiting for the chatcompletion to finish generating the context takes based on user prompt takes much time so I decided to try the “stream” feature.

collected_chunks = []
    collected_messages = []
    async for chunk in chat_response:
        collected_chunks.append(chunk)
        chunk_message = chunk.choices[0].delta.content  # extract the message
        collected_messages.append(chunk_message)  # save the message

    collected_messages = [m for m in collected_messages if m is not None]
    full_reply_content = ''.join([m for m in collected_messages])

My question is how can I use TTS to use these chunks of messages to convert to speech?
Will it be ideal to pass every chunks to tts and can tts able to handle theses chunks?
I believe if I pass every chunks to TTS, I will have to keep calling the “client.audio.speech.create” API to do so.

Is there anyone can provide me better design for this?

sps · March 13, 2024, 11:02am

Welcome to the developer forum @mariaclara.agent.ai

This is not the right approach to consume the TTS API, as it will quickly eat away at your requests per minute (RPM) limits.

Additionally, the OpenAI TTS, unlike other TTS, processes text differently based on the context present within the string. Sending text token-by-token would cause it to give undesired outputs.

stafuk · June 19, 2024, 12:01pm

Here’s a working implementation using threading. It links together a whole chain (you provide a promp, you start hearing the response while everything is still streaming) such that you can stream the audio response to a prompt. It works using threading by using one thread to stream the text reply into phrases which are enqueued for TTS. Then a second thread which TTS’s each phrase as it completes. And finally a third thread which starts playing out loud each phrase as it’s been TTS’d.

The final effect is much like working with the ChatGPT app where you get “streaming audio response” to your question and don’t have to wait to have the full text come back before you can start listening to audio. What’s here I’m sure could be improved and it’s primarily designed to show, in a terminal, it all put together.

I’m not sure why, but I’m not allowed to put links in my post, says the website. So you’ll have to assemble the following to see it.

gist[dot]github[dot]com/Ga68/3862688ab55b9d9b41256572b1fedc67

Topic		Replies	Views
Getting audio stream from chat completion API API chatgpt , api , tts	5	4171	December 25, 2023
Streaming text in and audio out? API api , tts	3	3873	June 19, 2024
ChatGPT API TTS streaming API api	3	4648	January 21, 2025
How can I stream chatGPT responses into the new TTS APIs? API tts , streaming	2	6588	November 30, 2023
How to replace my GPT TTS call for better performance? API tts , audio	1	240	November 5, 2024

ChatCompletion stream to tts

Related topics