Transcribe via Whisper in real-time / live

josephUL · September 6, 2023, 1:10am

I am aware that currently it is not possible to transcribe in real time, but rather send the m4a, mp3, mp4, mpeg, mpga, wav and webm after the recording has completed in order to transcribe. While the transcription is fairly fast, live transcription is not possible. Is there any intentions to make this live?

curt.kennedy · September 6, 2023, 1:18am

I’ve heard of people sending small 10s-30s chunks, and putting the transcribed words together for a quasi-real-time system. The trick is resolving across word boundaries, so you need something like pydub to chunk smartly.

But no idea on a live streaming release date.

josephUL · September 6, 2023, 1:23am

That’s an Excellent idea, and it seems plausible as a solution. Thanks!

alessandro.saccoia · December 26, 2023, 10:31am

I had the same problem and so I’ve created a working proof of concept for real time transcription with a websocket server and a demo JS client here if you want to check it out, github alesaccoia/VoiceStreamAI

Topic		Replies	Views
Whisper API Latency is just too high! API whisper	2	4556	December 25, 2023
Whisper Streaming Strategy API chatgpt , whisper , streaming	6	13439	March 14, 2025
How to create a (near) realtime Speech-to-Text using Whisper? API	0	487	January 25, 2025
Whisper API streaming - feature request API whisper	1	3334	July 22, 2024
What is the difference between realtime-transcription and speech-to-text for Streaming the transcription of an ongoing audio recording? API api , whisper , audio , realtime , api-realtime	2	251	April 1, 2025

Transcribe via Whisper in real-time / live

Related topics