AMA on the 17th of December with OpenAI's API Team: Post Your Questions Here

  1. Do we expect o1 and the realtime API to eventually converge? Or if we want an audio model that’s smart, will we need to call o1 as a tool call from the realtime API?

  2. Related: Do you have it on the roadmap for the realtime API to take video input? Or the o1 API to take audio or video input (similar to Gemini 2.0)?