How to prevent TTS mispronunciations in real-time speech responses?

I’m working with a realtime API speech and encountering consistent mispronunciations, despite providing the correct text input and examples. Does anyone has any strategy or best practice to prevent these errors in real-time environment?