Is gpt-realtime and gpt-4o the same thing?

I scaned the openai platform and noticed the realtime API page.

api-reference/realtime

I am developing an agent. I wanna use the realtime API service in my agent application. So I am confused these words in this website:

Here is how to create a realtime audio connection between user and AI

curl -X POST https:/z/api.openai.com/v1/realtime/calls
-H “Authorization: Bearer $OPENAI_API_KEY”
-F “sdp=<offer.sdp;type=application/sdp”
-F ‘session={“type”:“realtime”,“model”:“gpt-realtime”};type=application/json’

It shows the model name is gpt-realtime.

What I am confused about is gpt-4o model can be used in realtime conversation. So, what is the relationship and difference between gpt-4o and gpt-realtime?

Does gpt-realtime just a name of a group of realtime models?

Short answer: no, they’re not the same thing.

gpt-4o is a model family (multimodal, general-purpose).
gpt-realtime is a deployment / interface optimized for low-latency streaming, mainly for audio + interactive use cases.

Think of it this way:

  • gpt-4owhat the model can do

  • gpt-realtimehow the model is exposed for real-time interaction

Realtime APIs prioritize:

  • Persistent connections (WebRTC / WebSocket)

  • Token-by-token streaming

  • Audio I/O with very low latency

Under the hood, realtime endpoints may run variants of 4o-class models, but you don’t select them the same way you do in standard Responses calls.

So gpt-realtime isn’t a “group of models” — it’s a realtime-optimized serving layer designed for conversational agents, voice, and live interactions.

1 Like

Thank you!

As a multimodal model, gpt-4o can be plugged into the gpt-realtime interface.

I will try gpt-realtime in my AI software.

TL;DR

Realtime is an API, focusing on speech-to-speech in low-latency streaming setups.

Models that include “realtime” in the name are compatible with that endpoint.

https://platform.openai.com/audio/realtime/edit

1 Like