Do you know someone who knows how to use the OpenAI real-time API at a minimal cost? I want to integrate this feature into an application I have, but I’m facing a problem: the cost is too high
1 Like
I’m sorry, but I don’t know somebody. I can just answer myself.
For voice, you are billed $100/M on input tokens; $200/M on output tokens. That is billed every time the AI model responds.
The input tokens are everything spoken or said in a realtime session before, until you start anew. The output is the length of the AI response, voice or text.
The server maintains the conversation history, not you. It continues to grow. OpenAI must have a limit to prevent the model from exceeding the input context window each time a response is triggered, but you have no such limit setting.
So, to minimize:
- You decide when to generate an answer after speaking, not some automatic voice detector;
- You shrink down the sent audio to just that part with detected voice;
- You get the AI to say less;
- You keep the total conversation short, hanging up and restarting;
- You don’t use voice, but text;
- You don’t trigger the AI to ever generate a response.
- You don’t try to compete with ChatGPT and OpenAI’s market dominance, where they can keep needed features to themselves, can out-price anyone else for longer, and all you can offer is a wrapper around their services.
1 Like