Hi,
It’s great to see this. I had a similar idea, but I am still researching the tech stack. I found out that many platforms have some sort of text-to-speech API for accessibility, like speechSynthesis
in the Web API, but the quality is worse.
I am also curious if there is any way for us to call a sequence of OpenAI APIs, but it seems like there isn’t. I guess the closest we can get is to have your server deployed on Azure.
Real-time apps are super sensitive to lags, so we should find a way to manage that properly. If you have any good solutions, please keep us updated.