Issue with Incomplete Audio Output Using OpenAI's tts-1 Model

Hello,

I am encountering a consistent issue where the OpenAI tts-1 model produces incomplete audio outputs. Regardless of the text input, the audio consistently stops prematurely, typically containing only about 50-60% of the intended content. For example, when I input a sequence of numbers (“1 2 3 4 5 6 7 8 9”) for text-to-speech conversion, the audio output only includes up to the number “6”.

Here is the Python code snippet I am using:

from openai import OpenAI

client = OpenAI()

with client.audio.speech.with_streaming_response.create(
    model="tts-1",
    voice="alloy",
    input="1 2 3 4 5 6 7 8 9",
) as response:
    response.stream_to_file("speech.mp3")

Environment Details:

  • Operating System: [macos Ventura 13.6]
  • Python Version: [Python 3.12.2] - I also tried 3.10
  • OpenAI Library Version: [1.29.0]
  • Limits: tier 3

This issue appears as if the audio generation process is being truncated or not fully streaming the response content to the file. The stream_to_file method does not seem to update the file continuously and might be closing the stream prematurely.

I have attempted several troubleshooting steps including updating the OpenAI library, checking for token limits, and adjusting the input length, but the issue persists.

Could you please help identify why this truncation is happening and how to resolve it?

Thank you for your help.