How long does the gpt-4o-audio-preview retain audio_Id information?

A process that was running smoothly encountered the following error when executed after some time had passed.
How long does the gpt-4o-audio-preview retain audio information?

BadRequestError: Error code: 400 - {'error': {'message': 'Assistant audio audio_67544060dacc8190b5bcc01f9be218e9 not found.', 'type': 'invalid_request_error', 'param': 'content.messages[2].audio.id', 'code': 'audio_not_found'}}
1 Like

Expiration is a response object parameter you can obtain.

audio_data = choice.get('audio', {})
audio_expires_at = audio_data.get('expires_at', 0)

We have to produce our own documentationā€¦

sending API request
API request: success
-- choices list received --
 Welcome! I am your AI assistant, here to assist you with your needs. I can provide information, answer questions, and help you navigate through various tasks. I'm ready to help whenever you need!

 {'prompt_tokens': 179, 'completion_tokens': 424, 'total_tokens': 603, 'prompt_tokens_details': {'cached_tokens': 0, 'audio_tokens': 0, 'text_tokens': 179, 'image_tokens': 0}, 'completion_tokens_details': {'reasoning_tokens': 0, 'audio_tokens': 364, 'accepted_prediction_tokens': 0, 'rejected_prediction_tokens': 0, 'text_tokens': 60}}

Response Metadata:
Current Epoch Time: 1733618249
Audio ID: audio_67...
Audio Expiration Epoch: 1733621847
Time to Expiration: 59m 58s remaining

Prompt: 
  • an hour

When there is no audio ID, if you continue to try to interact with only the transcript as assistant messages, expect the model to start dropping out of voice, responding with ā€œcontentā€. It is pretty obvious it speaks and speaks in its voice only by in-context preparation of audio tokens, and audio-only modality canā€™t be used (likely because OpenAI has little control and hasnā€™t simply demoted all public text string token numbers as a solution).

Just the second user input, using assistant transcript, and no voice audio is produced:

sending API request
API request: success
-- choices list received --
Well, I might not have feelings like humans do, but I sure do enjoy helping you out! It's rewarding to be of assistance and to make your tasks a bit easier. You could say I have a friendly AI affection for all my users!  

 {'prompt_tokens': 235, 'completion_tokens': 50, 'total_tokens': 285, 'prompt_tokens_details': {'cached_tokens': 0, 'audio_tokens': 0, 'text_tokens': 235, 'image_tokens': 0}, 'completion_tokens_details': {'reasoning_tokens': 0, 'audio_tokens': 0, 'accepted_prediction_tokens': 0, 'rejected_prediction_tokens': 0, 'text_tokens': 50}}

Response Metadata:
Current Epoch Time: 1733618526
Audio ID: ...
Audio Expiration Epoch: 0
Time to Expiration: Expired
2 Likes

I just realized there was an ā€˜Expirationā€™ parameter that I had overlooked.
Thank you for your response.

I also tested using text in the assistantā€™s content myself.
As expected, it no longer provided audio responses.
What should I do about thisā€¦

Why does it work that way? OpenAI doesnā€™t want us placing our own voices as assistant examples.

It would be too useful to demonstrate and teach the AI multi-shot audio assistant turns, talking in the tone or voice you want for your product, and thus obtain custom or untrained and unenforced audio output or recitation.

Thus the breaking of the whole paradigm of completions by a server-side audio content generation that is not retained for long and can only be used for one purpose (and also now with generation you pay for but never get to inspect with o1).

We can only observe what has been decided for us.

Thanks.
I understand, I understand, butā€¦

I simply want to implement the ability to pause an audio conversation and then resume it from where we left off in the previous conversation.

Since the history can be in text, maybe specifying ā€˜audioā€™ in the response_format would work?
Iā€™ll try it later.

2 Likes

That wonā€™t work. response_format is only for specifying json_object or json_schema as a structured text response output format.

What you would assume would work is the modalities list API parameter:

"modalities": ["text", "audio"]

However, that must remain with both list entries: the API and the model used will deny attempts to specify just one.

It is pretty much impossible to restart a chat completions voice output session once the assistant audio demonstrating the continued voice output expires. No instruction can get the AI to switch modalities. You can try a summary of the past chat as part of a system message of a reset chat.

2 Likes

Thank you for your thorough explanation.
Although itā€™s unfortunate, I understand.

2 Likes