While the current system is great, its biggest flaw is needing that connection to be proxied from a backend.
This increases costs beyond just the costs of the model usage, and increases latency which is kinda counter-productive to the whole idea.
I propose a Client Token and Server Token sort of setup. Aka a backend endpoint that generates a client token that expires in a min, and can be used to init a websocket directly to openAI from the client directly.
This lowers the complexity for devs, costs and latency, and would take very little work.