I’ve been testing the OpenAI TTS capabilities and they are great.
Moving to a more production ready application i need to ensure that some words are always enounciated correctly.
Is there any opportunity to improve the way to drive the API ?
Phoneme seems a reasonnable direction i.e:
chat
Having this enabled can enable the construction of a dictionnary (on the app side not the API) that can enusre the translation.
Any other suggestions are more than welcomed.
A wider move to enable SSML would also be great (but this not the subject here).
Let me know if there are existing “good practice”, or if this could be considered as a feature request (and how to raise it).
It’s a fun idea, but one must consider the AI reward training: labeled text to spoken text.
“phonemes” is not going to be a type of corpus that OpenAI would have considered the expense of developing with their training speakers. The corresponding plain text could be converted, to provide a whole new model. One would have to consider even the British pronunciation of “Fable”, though.
The results are humorous, but playing them may invoke dark spirits.
Original:
IPA phonetics:
There might be particular words where you can substitute words of ambiguity with homonyms that are clear. Like read → “reed”. However, the pronunciation is guided by understanding the whole sentence, so something that doesn’t make sense may actually be counter to the goal.