Can GPT-4o analyze audio like it does with pictures?

talgymusic · July 29, 2024, 11:46pm

I know that GPT 4o can analyze images - when you input a given image, it can describe what it “sees”, turn that image into color palette, suggest a new color that will match the image, etc. I’m wondering if it can do the same for audio - e.g. if it can recognise music genres, provide music feedback, etc? I mean I tried to use it this way and it didn’t seem to work.

thinktank · July 30, 2024, 3:53am

Hi, GPT 4o is “multi-modal” so yes it understands images, audio, and video in theory.

That’s what the “o” is for, “omni.”

But as for it working, I dunno. I think ‘audio’ might still be turned into a transcript, so it’s not “true” audio analysis, as in understanding a wave form.

turbolucius · July 30, 2024, 6:47am

The demos posted on OpenAI’s website and socials when the model was revealed appear to actually use audio as input, seeing as the model can recognise tone and differentiate speakers based on voice alone, which wouldn’t be possible with classic transcription methods (like Whisper).

The extent of its capabilities aren’t fully known yet since the audio input/output modalities still haven’t been released to the public. I’m absolutely looking forward to testing out its musical abilities, although I wouldn’t be surprised if they end up being kinda bad. GPT models have been really bad with anything related to music before, it’s probably not the priority for OpenAI.

Topic		Replies	Views
What will be the final/full released capabilities of GPT-4o in the API? API gpt-4 , chatgpt , api	0	1945	May 27, 2024
Can GPT-4o directly analysis audio not depend on transcript? API	2	279	November 28, 2024
Can ChatGpt-4-o Function evaluate a Audio API gpt-4 , chatgpt , api	4	1544	June 2, 2024
Access to GPT-4o's Auido Capability Community gpt-4	1	3948	May 15, 2024
Enabling Audio Access for GPT-4o via API API gpt-4	0	334	September 5, 2024

Can GPT-4o analyze audio like it does with pictures?

Related topics