I’m playing around with the Realtime API, it’s really cool and makes fun. But I came across one question in a production scenario.
Each Websocker Session in the Backend is equal to a chat session, this makes sens, but brings me to the question of how a system looks at scale.
For Example, GCP Cloud Run supports web sockets, but there is no real session affinity for connections there is 100% there is still a chance to get routed to a different instance.
Is there any recommendation to system design for a backend used by a Web App?
Or perhaps a paper how open AI scales the realtime api feature?
The view I take on this is if there is not a service provider able to keep a websocket open right now, there will be in a few weeks when hosting services realise they can make money serving realtime-api clients.
What I tend to do is build apps making the assumption that money is a great and rapid motivator. Then look into the scaling issue once I have an application that I think has a chance to scale.
Also, I’m pretty sure Azure has services for this, not looked, but as all of this is sitting on Azure right now… it would make sense that there is suitable hosting for it.