I’ve reviewed all the Agents SDK documentation and I’m not clear about model API compatibility, specifically:
Will it work with OpenAI Realtime-preview model?
Will it work with Azure OpenAI model endpoints? If so, is that limited to specific models?
I am about to start testing these things myself, but I will appreciate any time-saving directions anyone can recommend within the multiple approaches listed here: https://openai.github.io/openai-agents-python/models/
… or save me the time by shutting down my tests entirely if these models are just not supported now
Specifically around the Realtime model, which does not support Structured outputs out of the box, would that lead to the bad request error mentioned in the link above with some models that don’t support structured outputs?
gpt-4o-realtime-preview is only compatible with the realtime endpoint, over a WebRTC or a WebSocket interface.
It requires an audio modality as input or output.
Agents SDK is specifically written for the Responses endpoint, which currently doesn’t support audio. Even gpt-4o-audio-preview, which you can use on Chat Completions, is not allowed on Responses, even showing in the model details.
You would have to make use of an external transcription tool or TTS, depending on what you are thinking about doing.
While I understand all your points, to be fair, they’re not touting the SDK as being only for Responses, it sounds like they’re going for it to be a more universal framework, so I’d be interested to hear if these previews will come closer together at any point.
I understand how to switch to Responses with external transcription and choose not to because I’ve achieved a more fluid voice first experience with the native multi-modal audio or text input <> output model so I do hope the team continues to push it.