Can GPT-4o directly analysis audio not depend on transcript?

zzyzzy12 · November 28, 2024, 6:35am

I’d like to use GPT-4o to analysis audio, and based my prompt. like what’s the duration of the audio, how many speaker in the audio, any background in the video … is that GPT-4o-audio-preview has the ability? or it just two step not one-shot that first turn to transcript, and than analysis the text.
If GPT-4o-audio-preview can do, dose GPT-4o-realtime-preview has the same ability?

j.wischnat · November 28, 2024, 6:44am

I can only speak for the realtime api as I haven’t used GPT-4o-audio.

The realtime API is NOT able to tell you how long an audio is, how many speakers there are or whether or not there is background noise.
There is no text involved with the model, it’s true audio-to-audio.

Hope that helps!

zzyzzy12 · November 28, 2024, 6:55am

Thanks. I also tried the realtime model and API. My experience is that I can send audio and text, and the text can provide information from the audio but can’t directly analyze the audio. It seems like a two-step process rather than a one-shot solution.

I’m running GPT on internal resources that only have the realtime-preview model, not the audio-preview model still now. Most likely, the realtime-preview model can’t do what I want. I’m not sure whether the audio-preview model has the capability.

Topic		Replies	Views
Invalid 'input_audio_transcription.prompt': string too long API transcribe , realtime , api-realtime	3	119	April 11, 2025
GPT-4o-transcribe and audio model ready to use via API? API transcribe	9	1476	July 2, 2025
Can GPT-4o analyze audio like it does with pictures? Community gpt-4	2	1301	July 30, 2024
Realtime API Audio Modality output API realtime , api-realtime , api-realtime-speech	7	884	December 13, 2024
Will audio output streaming be available with GPT-4o? API audio , gpt-4o	1	1070	June 20, 2024

Can GPT-4o directly analysis audio not depend on transcript?

Related topics