I don't understand the pricing for the realtime API

That’s what my suspicion was… You’ll need to use tools to keep cost down. Give the assistant a query() tool and make it always call that tool. That function can then call a much cheaper model and feed the result back to the model. If you use a fast & cheap model like gpt-4o-mini the add cost should be negligible and the added latency minimal.

Well there’s your issue.

This is not intended for a $15/month GPT wrapper service :rofl:

You can definitely integrate a vector database into the conversational flow with RealTime API. It’s (The API) a small piece of a greater pie.

Yup. This is my biggest concern out of all of them.

I am thinking that for now embeddings may be helpful to really nitpick the conversation size.

But, this is how it’s always been on a new release. It comes out, it’s expensive, we find silly ways to greatly reduce cost with the trade-off of silly edge-cases. Then OpenAI says “btw”, it’s now cheaper".

We’re basically now orchestrating information flowing asynchronously, and I’m excited.

1 Like

I’m still in the middle of my implementation but I’m hoping to have some best practices identified shortly… I was hoping I wouldn’t need a push-to-talk mechanism but that seems like a pretty good safety mechanism so i’m adding that to my experience.

1 Like

I haven’t tried but it looks like you’d have to kill off and reinitiate the conversation at each turn for this. Can you affect past state through the realtime API?

The RPM limit seems really low, so under the current configuration that might be tough.

And if you do use it like that, you’re essentially just using the realtime API as TTS :thinking:

Shouldn’t have to kill and re-initiate but perform background tasks to the (now oudated) flow of information.

You can send events to modify the conversation as it continues, I believe.

So if the topic changes, and you somehow know where the topic changed happened you should be able to snip all that context off without interrupting the flow. (I may be wrong here)

It may be that they are working on a solution already:

At this point [when the context limit is reached], the Realtime API automatically truncates the conversation based on a heuristic-based algorithm that preserves the most important parts of the context (system instructions, most recent messages, and so on.) This allows the conversation to continue uninterrupted.

In the future, we plan to allow more control over this truncation behavior.

One current situation that this would be nice is if there were stages of progress using function calling. On completion of a function call the conversation could be snipped and replaced(?) with a summarized version

hmm :thinking:

you’re right, it looks like you can arbitrarily insert and delete messages:

https://platform.openai.com/docs/api-reference/realtime-client-events/conversation-item-create

that’s cool, I was afraid it was gonna be locked into the assistant ecosystem.

Now more transparent pricing, and we’d have an amazing product!

I was considering using realtime API for prompted role play scenarios. Doing some testing today in the Playground with various prompts and I got a $26 bill. I can’t imagine I had more than 45 min output audio. For my project, this is cost prohibitive.

2 Likes

The beta client API has a deleteItem() method which will remove an item from the conversation history. What’s not clear is if it just removes that one item or prunes the history from that point. I’ve asked the dev on X.

1 Like

Firstly, calling an enterprise dev solution with millions of users a “wrapper” service is a bit insulting ;-), Secondly, an example of implementation of integrating the Realtime with a huge dataset that it can “converse” about would be great.

3 Likes

It’s even more confusing when you consider that audio in is not advanced audio, but its a transcription by Whispers (so cheap). I think what we are really paying for is the fact that this conversation is cached to make it so fast.

That’s not true… I know you see whisper in your connection options but that’s simply for the text transcripts. They run every utterance through 2 model calls. Gpt-4o-realtime for an audio response and whisper for the text transcription.

1 Like

but why do you need a small model at all for the retrieval, just reduce your sys prompt, and then give tools to the realtime api to retreive data only when/if needed? no need of additional models, you just process the retrieval automatically in code? or you are considering summarization?

lol i dont know what im doing here, getting trapped :smiley:

It’s confusing I know but once you understand all of the pieces it makes sense that things work the way they do. You need the text back so that you can a) log what was said for debugging purposes. B) do safety checks by running the text through moderation. C) rebuild the conversation history for long running sessions.

The realtime model can’t do everything in one pass so they have to make a second pass for ver the audio to get the transcript. They just use a small and cheap whisper model for that second pass.

It’s now August 12, and I think this subject is still extremely actual. I still do not know what the pricing is of this streaming audio API. Is it $100-$200 for input/output or is it $20-$40, which I’ve seen somewhere else? I also have no idea how much tokens are used per minute. It looks like it’s more like 10,000 tokens per minute. So is the cost $18 per hour, $3 per hour, less, more? I have no idea. Can anybody help me here? And help a simple Amsterdam software developer with some clear answers?

thanks!

1 Like

All the realtime models listed by version have been stripped out of the pricing guide, and you get only an alias. Thus: misinformation, and the chance of using a model that is billing much more than listed.

Only the new model announced here is cheaper:

A meandering article is the product of Agent mode:

Observations & conclusion

  • Per‑minute cost remains moderate. After the price cut, using GPT‑4o Realtime costs about 4 ¢ per turn and around $9 per hour when both parties speak; the mini version costs roughly 1 ¢ per turn and $2–3 per hour. These costs increase if conversation history and tool calls produce many text tokens.