Hello,
I recently started using the TTS Endpoint. When I send a request for a wav file, I noticed the maximum wav data size (and possibly other attributes) are corrupted.
My request is:
curl https://api.openai.com/v1/audio/speech \
-H "Authorization: Bearer xxx" \
-H "Content-Type: application/json" \
-d '{
"model": "tts-1",
"input": "The quick brown fox jumped over the lazy dog.",
"voice": "alloy",
"response_format": "wav"
}' \
--output speech.wav
I can play the file, but I cannot load in in certain programs (such as Unreal Engine 5.3). When I use ffprobe, I get
[wav @ 0x55e3da9d7a80] Ignoring maximum wav data size, file may be invalid
[wav @ 0x55e3da9d7a80] Packet corrupt (stream = 0, dts = NOPTS).
[wav @ 0x55e3da9d7a80] Estimating duration from bitrate, this may be inaccurate
Input #0, wav, from 'speech.wav':
Duration: 00:00:02.76, bitrate: 384 kb/s
Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 24000 Hz, 1 channels, s16, 384 kb/s
and with soxi
Input File : 'speech.wav'
Channels : 1
Sample Rate : 24000
Precision : 16-bit
Duration : 24:51:18.49 = 2147483647 samples ~ 6.71089e+06 CDDA sectors
File Size : 133k
Bit Rate : 11.9
Sample Encoding: 16-bit Signed Integer PCM
(Notice the duration)
Did I do something wrong or might this be a bug? Does anyone know a way to fix this?