Seeking Guidance on Whisper API for End of Speech Detection for Transcription

DawidM · May 26, 2023, 5:26pm

Hello Everyone,

I’m currently working with OpenAI’s Whisper API and have been pleased with the results, particularly in terms of the speech recognition quality it provides. My project involves developing an application where the functionality is centered around the user speaking into a microphone and then having a transcription of their speech displayed once they finish speaking.

As I delve deeper into the process, I’ve identified a crucial need for an effective method to detect when the user has finished speaking. This end-of-speech detection would allow the system to trigger the transcription process, providing a streamlined user experience where their speech is transcribed only after they’ve concluded their thoughts.

I have thoroughly gone through the Whisper API documentation, but haven’t been able to find specific details about that.

So, my question is, does the Whisper API provide any capabilities or mechanisms to identify when a user stops speaking, and only then initiate the transcription process? I realize that this may not be a straightforward problem and there might be various factors at play. However, I’d appreciate any pointers or directions.

Thank you in advance for your help.

sps · May 26, 2023, 8:20pm

Hi @DawidM

The audio transcribe API simply takes an audio file of supported format under 25MB and a model name, among other params, and returns its transcript in the requested format.

The feature you suggested will have to be developed in your client application.

I recommend reading docs and API reference to for a better understanding.

Topic		Replies	Views
Can I use Openai Realtime API for Speech-to-Text? API realtime	5	409	January 30, 2025
Detect Silence using whsiper API gpt-4 , api , whisper	3	3611	November 5, 2024
Help Putting Whisper Code Into Python Script API	2	2284	January 29, 2024
Whisper Transcription Questions API whisper	10	4501	March 13, 2024
Ability to limit Whisper's Duration? API whisper	2	1049	December 18, 2023

Seeking Guidance on Whisper API for End of Speech Detection for Transcription

Related topics