Issue with Incomplete Audio Output Using OpenAI's tts-1 Model

ArturJ · May 14, 2024, 10:01am

Hello,

I am encountering a consistent issue where the OpenAI tts-1 model produces incomplete audio outputs. Regardless of the text input, the audio consistently stops prematurely, typically containing only about 50-60% of the intended content. For example, when I input a sequence of numbers (“1 2 3 4 5 6 7 8 9”) for text-to-speech conversion, the audio output only includes up to the number “6”.

Here is the Python code snippet I am using:

from openai import OpenAI

client = OpenAI()

with client.audio.speech.with_streaming_response.create(
    model="tts-1",
    voice="alloy",
    input="1 2 3 4 5 6 7 8 9",
) as response:
    response.stream_to_file("speech.mp3")

Environment Details:

Operating System: [macos Ventura 13.6]
Python Version: [Python 3.12.2] - I also tried 3.10
OpenAI Library Version: [1.29.0]
Limits: tier 3

This issue appears as if the audio generation process is being truncated or not fully streaming the response content to the file. The stream_to_file method does not seem to update the file continuously and might be closing the stream prematurely.

I have attempted several troubleshooting steps including updating the OpenAI library, checking for token limits, and adjusting the input length, but the issue persists.

Could you please help identify why this truncation is happening and how to resolve it?

Thank you for your help.

giggles · May 29, 2024, 10:01pm

Did you fix it?
I am experiencing the same issue.
The audio output for single word phrases e.g. “sky” is completely silent.
When I use “Hello. good day”, I might get “good day” or ‘Hello’, but never the complete phrase.

maiconsanson · May 31, 2024, 12:52pm

Did you try to add extra characters between numbers, like commas or ellipses?

"1... 2... 3... 4... 5... 6... 7... 8... 9"

In some situations in Portuguese, like time, I’ve created a function called convertTimeToSpeech(time) to handle that and sound more naturally.

For example:

convertTimeToSpeech('12:30') 
// meio-dia e trinta
convertTimeToSpeech('00:10')
// meia-noite e 10

Topic		Replies	Views
TTS models returning blank audio and repetitions API	1	112	April 6, 2025
Huge problems with TTS API Bugs tts	4	1950	May 27, 2024
Dropping Numbers With TTS API while Generating Speech Bugs api , tts	3	834	March 19, 2024
Text To Speech (tts-1) dropping numbers when reading numbered lists Bugs api , tts	3	2014	January 14, 2025
The output audio does not fully match the output text; it ends early API api , realtime	2	426	October 11, 2024

Issue with Incomplete Audio Output Using OpenAI's tts-1 Model

Related topics