TTS: WAV file is corrupted

Hello,
I recently started using the TTS Endpoint. When I send a request for a wav file, I noticed the maximum wav data size (and possibly other attributes) are corrupted.

My request is:

curl https://api.openai.com/v1/audio/speech \
  -H "Authorization: Bearer xxx" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "tts-1",
    "input": "The quick brown fox jumped over the lazy dog.",
    "voice": "alloy",
	"response_format": "wav"
  }' \
  --output speech.wav

I can play the file, but I cannot load in in certain programs (such as Unreal Engine 5.3). When I use ffprobe, I get

[wav @ 0x55e3da9d7a80] Ignoring maximum wav data size, file may be invalid
[wav @ 0x55e3da9d7a80] Packet corrupt (stream = 0, dts = NOPTS).
[wav @ 0x55e3da9d7a80] Estimating duration from bitrate, this may be inaccurate
Input #0, wav, from 'speech.wav':
  Duration: 00:00:02.76, bitrate: 384 kb/s
  Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 24000 Hz, 1 channels, s16, 384 kb/s

and with soxi

Input File     : 'speech.wav'
Channels       : 1
Sample Rate    : 24000
Precision      : 16-bit
Duration       : 24:51:18.49 = 2147483647 samples ~ 6.71089e+06 CDDA sectors
File Size      : 133k
Bit Rate       : 11.9
Sample Encoding: 16-bit Signed Integer PCM

(Notice the duration)
Did I do something wrong or might this be a bug? Does anyone know a way to fix this?

image

Here’s a quick add-on to my existing Python language TTS chunk stream “example”, that if you want a wav file (which you specify as file path near the bottom along with text), it will use the PCM method to obtain raw audio and save as WAV. Then magically working.

from pathlib import Path
from openai import OpenAI, OpenAIError
import wave
import io

def save_pcm_as_wav(pcm_data: bytes, file_path: str, sample_rate: int = 24000, channels: int = 1, sample_width: int = 2):
    """ Saves PCM data as a WAV file. """
    with wave.open(file_path, 'wb') as wav_file:
        wav_file.setnchannels(channels)
        wav_file.setsampwidth(sample_width)
        wav_file.setframerate(sample_rate)
        wav_file.writeframes(pcm_data)

def fetch_pcm_audio(model: str, voice: str, input_text: str) -> bytes:
    """ Fetches PCM audio data from the OpenAI API. """
    client = OpenAI()
    pcm_data = io.BytesIO()
    
    try:
        with client.audio.speech.with_streaming_response.create(
            model=model,
            voice=voice,
            input=input_text,
            response_format='pcm'
        ) as response:
            for chunk in response.iter_bytes():
                pcm_data.write(chunk)
    except OpenAIError as e:
        print(f"An error occurred while trying to fetch the audio stream: {e}")
        raise

    return pcm_data.getvalue()

def save_audio_stream(model: str, voice: str, input_text: str, file_path: str):
    """ Saves streamed audio data to a file, handling different OS path conventions. """
    # Construct the path object and validate the file extension
    path = Path(file_path)
    valid_formats = ['mp3', 'opus', 'aac', 'flac', 'wav', 'pcm']
    file_extension = path.suffix.lstrip('.').lower()

    if file_extension not in valid_formats:
        raise ValueError(f"Unsupported format: {file_extension}. Use: {valid_formats}.")

    if file_extension == 'wav':
        pcm_data = fetch_pcm_audio(model, voice, input_text)
        save_pcm_as_wav(pcm_data, file_path)
    else:
        client = OpenAI()
        
        try:
            with client.audio.speech.with_streaming_response.create(
                model=model,
                voice=voice,
                input=input_text,
                response_format=file_extension
            ) as response:
                with open(path, 'wb') as f:
                    for chunk in response.iter_bytes():
                        f.write(chunk)
        except OpenAIError as e:
            print(f"An error occurred while trying to fetch the audio stream: {e}")

# Usage example, demonstrating a Windows file path
save_audio_stream(
    model="tts-1",
    voice="onyx",
    input_text="Hi. I'm an AI, in case you didn't know.",
    file_path="/chat/myaudio2.wav"
)
print("if it didn't crash, you have an audio file!")

Thank you very much, this tip helped me a lot!

seems to be a bug. When I try to read an OpenAI wav file with ffmpeg, it outputs the warning: Ignoring maximum wav data size, file may be invalid

1 Like