Hi,
I’m reaching out to seek assistance with an issue I’m encountering while using the Whisper API for Hindi speech-to-text transcription in my application.
Issue Description:
When transcribing short Hindi phrases consisting of 2-3 words, the Whisper API struggles to accurately capture the intended words. However, longer conversations with multiple sentences are transcribed with high precision. This inconsistency is affecting the reliability of my application, especially in scenarios where concise inputs are common.
Current Setup:
- API Used: OpenAI Whisper API
- Language Configuration: Set to Hindi by default (
language='hi'
) - Implementation Details
import openai
openai.api_key = 'YOUR_API_KEY'
def transcribe_audio(file_path):
with open(file_path, "rb") as audio_file:
transcript = openai.Audio.transcribe("whisper-1", audio_file, language='hi')
return transcript['text']
Steps Taken So Far:
- Language Parameter: Explicitly set the
language
parameter to Hindi to ensure the model prioritizes Hindi language processing. - Input Variations: Tested multiple short phrases to determine if the issue persists across different inputs.
- Comparative Analysis: Compared transcriptions of short phrases against longer conversations to confirm the inconsistency in accuracy.
- Audio Quality: Verified that the audio recordings are clear and free from background noise, ruling out audio quality as a potential cause.
My Question:
Are there any recommended strategies or configurations within the Whisper API that can enhance the accuracy of transcribing short Hindi phrases? Specifically, I’m looking for ways to ensure precise word extraction for brief inputs without compromising the performance on longer conversations.
Any insights, suggestions, or best practices would be greatly appreciated.
Thank you for your time and support!
Best regards,
Shashank