I am using OpenAI’s text-to-speech API and would like to fine-tune the pronunciation (e.g., speed, intonation, accent). In particular, I am looking for ways to address issues such as mispronunciations or to specify how certain words or phrases should be read. Does anyone know if there are additional parameters or methods to achieve this?
Here’s the current request setup:
javascript
const requestJson = {
model: 'tts-1',
voice: languageCode, // Language or voice type
input: text, // Text to be spoken
speed: speakingRate // Speaking rate (adjustable)
};
const res = await fetch(requestUrl, {
method: 'POST',
headers: {
'Content-Type': 'application/json'
},
body: JSON.stringify(requestJson)
});
const data = await res.arrayBuffer();
const audioBlob = new Blob([data], { type: 'audio/mp3' });
const audioUrl = URL.createObjectURL(audioBlob);
return audioUrl;
Any advice or insights would be greatly appreciated!