TTS is unpredictable and often really wrong for non-English requests

giannif · January 27, 2024, 4:08am

If you make the same tts request a few times in a row, you get a different response each time.

Run this a few times with and listen to the variations. Add your token:

curl 'https://api.openai.com/v1/audio/speech' \
  -H 'authority: api.openai.com' \
  -H 'accept: */*' \
  -H 'authorization: Bearer your-token' \
  -H 'content-type: application/json' \
  --data-raw '{"model":"tts-1-hd","input":"Me gustaría ir al cine","voice":"alloy"}' \
--output test.mp3

The first one usually seems to be correct. The second one has like an inflection, and the third is total gibberish.

That sounds like maybe there’s some order to it, but in my app, it’s total chaos. I have no idea. Simple sentences like “Dov’è il bagno” will return nonsense one time and then be perfect the next

Anyone have any idea? I’ve seen posts about the speech being wrong for non-English, but for me it seems like they can be correct, but you never know what you’re going to get

_j · January 27, 2024, 4:17am

There’s currently no control of aspects of model like seed or sampling (selection of sequences), so the variations are indeed seen.

The benefit is that you aren’t locked to a sentence that can never be pronounced correctly.

The runs should be stateless; multiple calls being independent.

You can try throwing in a baseline first sentence that clearly and simply distinguishes the language, even talking about what will follow.

giannif · January 27, 2024, 1:42pm

Yes, stateless makes sense. I guess I was imagining a pattern.

But then how would a baseline sentence work? It’d be in the same request?

I understand what you’re saying about the benefit, but getting back gibberish 1 out of 3 times is not viable for my product Oh how I wish I could just pass a language code !

mdyildirim · February 6, 2024, 1:11pm

giannif:

curl 'https://api.openai.com/v1/audio/speech' \
  -H 'authority: api.openai.com' \
  -H 'accept: */*' \
  -H 'authorization: Bearer your-token' \
  -H 'content-type: application/json' \
  --data-raw '{"model":"tts-1-hd","input":"Me gustaría ir al cine","voice":"alloy"}' \
--output test.mp3

That’s probably because of characters like '. You should find a way to get rid of them

giannif · February 6, 2024, 1:36pm

@mdyildirim Thanks for the suggestion, but that’s definitely not it. I’ve debugged for days, and there is just no consistency. It will be perfect one time, completely wrong the next with the exact same request

mdyildirim · February 6, 2024, 2:33pm

Hmm. I thought special characters in your example (Dov’è il bagno) created the issue. But what you’re saying worries me for the product i’m building.

AlbertWesker · May 16, 2024, 10:16pm

here to add to the discussion that, as of May 2024, it still randomly produces gibberish, making it completely unreliable for TTS translations in real life scenarios

cjapps111 · January 15, 2025, 7:06pm

Any news on this or how to fix? i’m still having the same issue in Jan 2025. The sentence “Te gusta el clima soleado?” it says everything fine but soleado it just says some random audio like “sheww”,

Topic		Replies	Views
[Text to Speech API] Chinese TTS unreliable and unusable API	6	2258	May 16, 2024
Any plans for releasing an API for TTS? API api , tts	28	5831	November 9, 2023
Huge problems with TTS API Bugs tts	4	1891	May 27, 2024
TTS API Speed and Quality Issues API api , tts	5	3763	February 6, 2024
Gpt-4o-mini-tts voice inconsistency between requests Bugs tts	1	152	April 8, 2025

TTS is unpredictable and often really wrong for non-English requests

Related topics