I have been making a TTS-based discord bot, and changed from amazon polly to openAI just recently… anyway, the only problem I ran into just recently is the fact that the API seems to have trouble pronuncing certain things like “JKM”, Even when I do one letter specificly, like “JJJJJJJJ”, “KKKKKK” and “MMMMMMM” its still not pronuncing any part of the letters
When you say it’s not pronouncing “JKM” I am assuming you want it to say “Jay Kay Em?”
My suggestion would be to try some preprocessing on your text and transform and all caps abbreviations like “JKM” to “J-K-M” or something similar.
But, that’s not always going to be what you want either.
Remember, it’s just a (very advanced) TTS model so it might just be getting confused as to how it should read something like “JKM.” Should it treat it like “NBC” or “FOX?” “UCLA” or “SUNY?”
I’m sure this has come up before (though I’ve not seen it personally).
The only thing I can really suggest i guess is build some scaffolding to help the model out. If you know there’s going to be some abbreviations like that, try to preprocess the ones you know should be spelled out.
Each run of audio creation will have different results. There’s some random aspect of sampling behind it.
There is also a
tts-1-hd-1106 model that one can specify, with a more limited rate. The differences are quite subtle, but one can try out these edge cases to see if it has benefit.
This is a very good point, You are right I want it to say “Jay Key Em” and I have tried to make a dictonary in the code, that would refer to this “Jay Key Em” each time a user would type “JKM”, It works… how ever it still skips sounds… like saying the “em” or nothing at all…
I have tested almost each and every TTS model… they dont seem to help when it comes to this issue, its also a very specific issue, so I understand why its pretty complicated