-
Do we expect o1 and the realtime API to eventually converge? Or if we want an audio model that’s smart, will we need to call o1 as a tool call from the realtime API?
-
Related: Do you have it on the roadmap for the realtime API to take video input? Or the o1 API to take audio or video input (similar to Gemini 2.0)?