Simple hack to make LLMs listen to an audio rather than only reading

ashwinthandu03 · November 4, 2025, 5:30am

Hey folks

I recently wrote a blog on how LLMs can “listen” to audio — not just read transcripts — by combining Whisper outputs with pitch, RMS, and other acoustic features.

Read it here: LLMs Meet Audio: Teaching AI to Hear Emotion, Not Just Read It

Would love to get feedback from others working with Whisper, embeddings, or sentiment from speech!

VeitB · November 12, 2025, 1:52pm

What if we used a more advanced model to gauge the speaker’s mood directly from the initial audio sample, then used that insight to refine the baseline annotation?

This could improve accuracy while keeping costs low.

Topic		Replies	Views
Speech to Text (ASR) Strategy Community whisper , audio , gpt-4o-audio-preview	8	797	March 10, 2025
Speech to Text (Whisper) to Review (ChatGPT) API whisper	1	2332	October 4, 2023
Web Speech API with whisper API whisper	1	751	July 24, 2025
Gpt-4o or whisper for kids speech Community whisper , audio	4	1282	July 12, 2024
Whisper Streaming Strategy API chatgpt , whisper , streaming	10	20355	October 22, 2025

Simple hack to make LLMs listen to an audio rather than only reading

Related topics