TTS and SST for multilingual audio

Hello I am trying to setup Chinese learning assitant with speech interaction.

This assistant needs to recognize audio in several languages ​​(my native language and Chinese), as well as voice a text response.

Is there still a problem with recognizing multilingual audio?

Is there a similar problem with voice acting?