I have noticed that the AAC files generated from audio/speech have a faulty file length. This is apparent when opening the files in the firefox player or via the previewer in OS X.
This does NOT seem to happen for the default mp3 type, but consistently for aac (I’ve seen it happen on other types as well, although I haven’t tested those as much).
Repro:
curl 'https://api.openai.com/v1/audio/speech' \
-H 'authority: api.openai.com' \
-H 'accept: */*' \
-H 'authorization: Bearer <token>' \
-H 'content-type: application/json' \
--data-raw '{"model":"tts-1-hd","input":"This is a shorter sentence, but the problem seems to be more prominent with longer texts that are over a minute long at least.","voice":"nova","response_format":"aac"}' \
--output audio.aac
I can’t attach a link or a zip to this topic with examples, but I have an example with aac/mp3 if needed.
Thank you for the thorough info I’m trying to avoid client side muxing, but might have to go that route then. Would be great if the returned aac files could be in a m4a/mp4 container from the start though.