Request for Improved Persian Accent Support in Text-to-Speech Service


Hello OpenAI Team,

I’m reaching out to share my observations and provide feedback on the text-to-speech (TTS) functionality, particularly regarding its performance in the Persian language. I’ve noticed that while the technology is adept at recognizing and converting Persian text to speech, the output seems to align more with an Afghan accent rather than a standard Persian (Iranian) accent.

I hypothesize that this issue might stem from the fact that the TTS system does not have an explicit input for specifying the desired language accent. Since Persian texts are linguistically similar across different regions (like Iran and Afghanistan), the system might not differentiate between the accents and defaults to an Afghan style, which differs significantly from the Tehrani accent or other regional Persian accents in Iran.

In an attempt to guide the TTS towards a Tehrani accent, I tried prefacing my texts with phrases like “this is a Persian text with a Tehrani accent,” but the results did not align with expectations. This leads me to suggest a possible enhancement: the introduction of a feature to explicitly specify the desired accent or dialect for a given language. Such a feature could enable users to choose between variations like Tehrani Persian, Afghan Persian, etc., thus providing more accurate and culturally relevant outputs.

The ability to fine-tune the accent in TTS is crucial for applications requiring linguistic and cultural precision, such as educational tools, localized content creation, and user interfaces designed for specific Persian-speaking communities.

I appreciate the complexity involved in developing nuanced TTS systems capable of capturing the subtleties of every language and dialect. However, enhancing the Persian TTS to include an option for selecting specific accents would be a significant step forward in making the technology more inclusive and widely applicable.

Thank you for your ongoing efforts to improve AI technology. I am looking forward to future developments that might address these nuances in TTS services.

Best regards,

I hope you see a response in some way. Being able to fine tune accents would be amazing.

Hi :slight_smile:

The reason for this problem is not the Afghan accent, but because the model is trained with data that is artificially generated by the existing TTS and has many mistakes.

To solve this problem, you can train your model with common voice data or with your own correctly recorded data. Also, using good g2p translator models will help you significantly.

Good luck,