Hey folks ![]()
I recently wrote a blog on how LLMs can โlistenโ to audio โ not just read transcripts โ by combining Whisper outputs with pitch, RMS, and other acoustic features.
Read it here: LLMs Meet Audio: Teaching AI to Hear Emotion, Not Just Read It
Would love to get feedback from others working with Whisper, embeddings, or sentiment from speech!