ChatCompletion stream to tts

Hi,

I am implementing a voice Ai where I want to generate a speech (via openai TTS) from the openai chatcompletion.
I realized that waiting for the chatcompletion to finish generating the context takes based on user prompt takes much time so I decided to try the “stream” feature.

collected_chunks = []
    collected_messages = []
    async for chunk in chat_response:
        collected_chunks.append(chunk)
        chunk_message = chunk.choices[0].delta.content  # extract the message
        collected_messages.append(chunk_message)  # save the message

    collected_messages = [m for m in collected_messages if m is not None]
    full_reply_content = ''.join([m for m in collected_messages])

My question is how can I use TTS to use these chunks of messages to convert to speech?
Will it be ideal to pass every chunks to tts and can tts able to handle theses chunks?
I believe if I pass every chunks to TTS, I will have to keep calling the “client.audio.speech.create” API to do so.

Is there anyone can provide me better design for this?

1 Like

Welcome to the developer forum @mariaclara.agent.ai

This is not the right approach to consume the TTS API, as it will quickly eat away at your requests per minute (RPM) limits.

Additionally, the OpenAI TTS, unlike other TTS, processes text differently based on the context present within the string. Sending text token-by-token would cause it to give undesired outputs.