Today, the best way is with a tool call that you use to trigger o1 (probably using the new out of band conversation feature). We’ll keep investing in making it easier to use more intelligence within the Realtime API
Re video. Stay tuned it’ll come next year. The model has been trained for this capability so in the mean time, you can experiment with it in ChatGPT