How long does the gpt-4o-audio-preview retain audio_Id information?

rui.hamada · December 7, 2024, 11:22pm

A process that was running smoothly encountered the following error when executed after some time had passed.
How long does the gpt-4o-audio-preview retain audio information?

BadRequestError: Error code: 400 - {'error': {'message': 'Assistant audio audio_67544060dacc8190b5bcc01f9be218e9 not found.', 'type': 'invalid_request_error', 'param': 'content.messages[2].audio.id', 'code': 'audio_not_found'}}

_j · December 8, 2024, 12:51am

Expiration is a response object parameter you can obtain.

audio_data = choice.get('audio', {})
audio_expires_at = audio_data.get('expires_at', 0)

We have to produce our own documentation…

sending API request
API request: success
-- choices list received --
 Welcome! I am your AI assistant, here to assist you with your needs. I can provide information, answer questions, and help you navigate through various tasks. I'm ready to help whenever you need!

 {'prompt_tokens': 179, 'completion_tokens': 424, 'total_tokens': 603, 'prompt_tokens_details': {'cached_tokens': 0, 'audio_tokens': 0, 'text_tokens': 179, 'image_tokens': 0}, 'completion_tokens_details': {'reasoning_tokens': 0, 'audio_tokens': 364, 'accepted_prediction_tokens': 0, 'rejected_prediction_tokens': 0, 'text_tokens': 60}}

Response Metadata:
Current Epoch Time: 1733618249
Audio ID: audio_67...
Audio Expiration Epoch: 1733621847
Time to Expiration: 59m 58s remaining

Prompt:

an hour

When there is no audio ID, if you continue to try to interact with only the transcript as assistant messages, expect the model to start dropping out of voice, responding with “content”. It is pretty obvious it speaks and speaks in its voice only by in-context preparation of audio tokens, and audio-only modality can’t be used (likely because OpenAI has little control and hasn’t simply demoted all public text string token numbers as a solution).

Just the second user input, using assistant transcript, and no voice audio is produced:

sending API request
API request: success
-- choices list received --
Well, I might not have feelings like humans do, but I sure do enjoy helping you out! It's rewarding to be of assistance and to make your tasks a bit easier. You could say I have a friendly AI affection for all my users!  

 {'prompt_tokens': 235, 'completion_tokens': 50, 'total_tokens': 285, 'prompt_tokens_details': {'cached_tokens': 0, 'audio_tokens': 0, 'text_tokens': 235, 'image_tokens': 0}, 'completion_tokens_details': {'reasoning_tokens': 0, 'audio_tokens': 0, 'accepted_prediction_tokens': 0, 'rejected_prediction_tokens': 0, 'text_tokens': 50}}

Response Metadata:
Current Epoch Time: 1733618526
Audio ID: ...
Audio Expiration Epoch: 0
Time to Expiration: Expired

rui.hamada · December 8, 2024, 1:49am

I just realized there was an ‘Expiration’ parameter that I had overlooked.
Thank you for your response.

I also tested using text in the assistant’s content myself.
As expected, it no longer provided audio responses.
What should I do about this…

_j · December 8, 2024, 2:28am

Why does it work that way? OpenAI doesn’t want us placing our own voices as assistant examples.

It would be too useful to demonstrate and teach the AI multi-shot audio assistant turns, talking in the tone or voice you want for your product, and thus obtain custom or untrained and unenforced audio output or recitation.

Thus the breaking of the whole paradigm of completions by a server-side audio content generation that is not retained for long and can only be used for one purpose (and also now with generation you pay for but never get to inspect with o1).

We can only observe what has been decided for us.

rui.hamada · December 8, 2024, 3:11am

Thanks.
I understand, I understand, but…

I simply want to implement the ability to pause an audio conversation and then resume it from where we left off in the previous conversation.

Since the history can be in text, maybe specifying ‘audio’ in the response_format would work?
I’ll try it later.

_j · December 8, 2024, 3:59am

That won’t work. response_format is only for specifying json_object or json_schema as a structured text response output format.

What you would assume would work is the modalities list API parameter:

"modalities": ["text", "audio"]

However, that must remain with both list entries: the API and the model used will deny attempts to specify just one.

It is pretty much impossible to restart a chat completions voice output session once the assistant audio demonstrating the continued voice output expires. No instruction can get the AI to switch modalities. You can try a summary of the past chat as part of a system message of a reset chat.

rui.hamada · December 8, 2024, 4:04am

Thank you for your thorough explanation.
Although it’s unfortunate, I understand.

Topic		Replies	Views
Realtime API re-consuming it's own output audio as input audio API audio , realtime , api-realtime , api-realtime-speech	10	870	January 10, 2025
Responses API Access To GPTs API	7	221	May 10, 2025
Is there a way to prevent gpt-4o-audio-preview from returning audio? API audio	8	554	December 17, 2024
Invalid 'input_audio_transcription.prompt': string too long API transcribe , realtime , api-realtime	3	88	April 11, 2025
Multiturn conversation format using gpt-4o-audio-preview with audio input API audio	1	435	November 12, 2024

How long does the gpt-4o-audio-preview retain audio_Id information?

Related topics