How to Fine-Tune Pronunciation with OpenAI's Text-to-Speech API?

Taro-oka · January 5, 2025, 12:53am

I am using OpenAI’s text-to-speech API and would like to fine-tune the pronunciation (e.g., speed, intonation, accent). In particular, I am looking for ways to address issues such as mispronunciations or to specify how certain words or phrases should be read. Does anyone know if there are additional parameters or methods to achieve this?
Here’s the current request setup:

javascript

const requestJson = {
    model: 'tts-1',
    voice: languageCode,  // Language or voice type
    input: text,          // Text to be spoken
    speed: speakingRate   // Speaking rate (adjustable)
};
const res = await fetch(requestUrl, {
    method: 'POST',
    headers: {
        'Content-Type': 'application/json'
    },
    body: JSON.stringify(requestJson)
});
const data = await res.arrayBuffer();
const audioBlob = new Blob([data], { type: 'audio/mp3' });
const audioUrl = URL.createObjectURL(audioBlob);
return audioUrl;

Any advice or insights would be greatly appreciated!

mkondakov83 · March 6, 2025, 8:12pm

I have experienced the same issue with Whisper’s pronunciation.
It looks like Whisper doesn’t support SSML prosody tags (correct me guys if I’m wrong).
So, you have only two options:

Use another service that supports SSML prosody tags.
Use a workaround by replacing words with incorrect pronunciation. Before sending the text, replace mispronounced words with ones that sound correct to you. You’ll need to experiment to find the right substitutions.

Topic		Replies	Views
Need Help Improving Whisper API Accuracy for Short Words and Pronunciation Tasks API whisper	0	259	December 13, 2024
OpenAI TTS able to use phonemes Feedback tts	1	497	September 25, 2024
How to prevent TTS mispronunciations in real-time speech responses? Bugs api-realtime-speech	0	43	July 2, 2025
Text to Voice API - Calling different languages API	2	1073	November 12, 2024
TTS API Speed and Quality Issues API api , tts	5	3980	February 6, 2024

How to Fine-Tune Pronunciation with OpenAI's Text-to-Speech API?

Related topics