Realtime API - Not ready for primetime

I know it’s still in beta, but OpenAI’s new Realtime API feels more like a sandbox for tinkering than something ready for serious applications.

Issues:

  • Voices: We’re still stuck with the robotic “Alloy” TTS, not the more natural voices available in advanced mode.
  • Turn Detection: It’s not as smooth as advanced mode; sometimes the AI responds while you’re still talking.
  • Premature Responses: The API seems to generate tokens too quickly, jumping to respond before I’ve finished my sentence.
  • Mishearing Issues: In one test, I was talking about Roman history, and the AI mistook “Caesar” for “Caesar salad” and went on a tangent.
  • High Costs: It’s advertised at $0.24/minute, but I was charged $16.40 for 11 minutes, translating to $1.49 per minute, or an eye-watering $89.40 per hour, even with minimal interruptions.
  • Cost Control: On the bright side, the system followed my system prompt—keeping responses to 2-3 sentences for better dialogue—but at $90/hour, it’s still too steep for regular use.

For now, the API isn’t quite ready for prime time, unless you’re just testing the waters.

1 Like

I haven’t test it yet. I heard the costs were high and that it was still based on tts-1, so no use cases for me. So you might be right, commercial use might not be ready, but it might work well for personal use if cash is not an issue.

I do have a question, so the model gpt-4o-realtime-preview still uses tts-1?

I’ve built my own “conversation feature” way back when whisper and tts-1 were released and the performance of a conversation was fine. I’m way more interested in tts-2 if that is a thing.

definitely pricey, but, as u say, good for tinkering on the bet that it will improve and get more efficient and lower cost

magic from the sky is what it is. at the end of the day. worth every penny

1 Like