Whisper API for Hindi Speech to Text

healthconsciousus · December 7, 2024, 10:32pm

Hi,
I’m reaching out to seek assistance with an issue I’m encountering while using the Whisper API for Hindi speech-to-text transcription in my application.

Issue Description:

When transcribing short Hindi phrases consisting of 2-3 words, the Whisper API struggles to accurately capture the intended words. However, longer conversations with multiple sentences are transcribed with high precision. This inconsistency is affecting the reliability of my application, especially in scenarios where concise inputs are common.

Current Setup:

API Used: OpenAI Whisper API
Language Configuration: Set to Hindi by default (language='hi')
Implementation Details

import openai

openai.api_key = 'YOUR_API_KEY'

def transcribe_audio(file_path):
    with open(file_path, "rb") as audio_file:
        transcript = openai.Audio.transcribe("whisper-1", audio_file, language='hi')
        return transcript['text']

Steps Taken So Far:

Language Parameter: Explicitly set the language parameter to Hindi to ensure the model prioritizes Hindi language processing.
Input Variations: Tested multiple short phrases to determine if the issue persists across different inputs.
Comparative Analysis: Compared transcriptions of short phrases against longer conversations to confirm the inconsistency in accuracy.
Audio Quality: Verified that the audio recordings are clear and free from background noise, ruling out audio quality as a potential cause.

My Question:

Are there any recommended strategies or configurations within the Whisper API that can enhance the accuracy of transcribing short Hindi phrases? Specifically, I’m looking for ways to ensure precise word extraction for brief inputs without compromising the performance on longer conversations.

Any insights, suggestions, or best practices would be greatly appreciated.

Thank you for your time and support!

Best regards,
Shashank

_j · December 8, 2024, 1:02am

Use the prompt parameter.

As the prompt, write Hindi language lead-up to what is spoken in the transcript (it is not for instructions).

Something plausible, like text of someone introducing a speaker from India who will be giving a presentation.

If it is only actual lengthy audio that will improve the output, you could place your own five seconds of preliminary speaking, something which is reliable and easy to strip out of the transcript response.

Good luck!

healthconsciousus · December 9, 2024, 10:53am

Hi, thanks for the inputs. Unfortunately this only solves part of the issue, and problem arises with different dialects and accents for Hindi Speech to Text. Is there any way we can customize and train the model with our own custom data with different dialects and accents using Whisper or OpenAI? Thanks!

gauravyadav · March 5, 2025, 5:04pm

Hey, Any Updates? did you find anything?

Topic		Replies	Views
Need Help Improving Whisper API Accuracy for Short Words and Pronunciation Tasks API whisper	0	228	December 13, 2024
Troubleshooting OpenAI's Whisper Model: Resolving Incorrect Language Outputs for Maithili with Multilanguage Tokenizer Community whisper	1	126	September 18, 2024
Whisper hallucinations + dropped sentences: Help? API whisper	3	3464	February 29, 2024
Whisper transcription translates to random language (Malay) API whisper	8	1131	July 16, 2024
Whisper Transcription Questions API whisper	10	4651	March 13, 2024

Whisper API for Hindi Speech to Text

Related topics