AAC files have wrong duration

I have noticed that the AAC files generated from audio/speech have a faulty file length. This is apparent when opening the files in the firefox player or via the previewer in OS X.

This does NOT seem to happen for the default mp3 type, but consistently for aac (I’ve seen it happen on other types as well, although I haven’t tested those as much).


curl 'https://api.openai.com/v1/audio/speech' \
  -H 'authority: api.openai.com' \
  -H 'accept: */*' \
  -H 'authorization: Bearer <token>' \
  -H 'content-type: application/json' \
  --data-raw '{"model":"tts-1-hd","input":"This is a shorter sentence, but the problem seems to be more prominent with longer texts that are over a minute long at least.","voice":"nova","response_format":"aac"}' \
--output audio.aac

I can’t attach a link or a zip to this topic with examples, but I have an example with aac/mp3 if needed.

aac alone is actually not a file format - it needs a container if you are not simply rendering it.

OpenAI doesn’t seem to know this, instead giving an ADTS stream in the file.

They use libfaac 1.30.

The AAC cannot be decoded by neroaacdec, instead giving a “moov box not found” error.

It lacks the metadata to determine the play time of a VBR file.

You could mux it into a mp4. Or just request a different format.

my audio:

ffprobe -v error -show_entries stream=codec_name,bit_rate,duration,r_frame_rate,avg_frame_rate -of default=noprint_wrappers=1 audio.aac

(the audio plays for 21s)

1 Like

Thank you for the thorough info :raised_hands: I’m trying to avoid client side muxing, but might have to go that route then. Would be great if the returned aac files could be in a m4a/mp4 container from the start though.