I know it’s still in beta, but OpenAI’s new Realtime API feels more like a sandbox for tinkering than something ready for serious applications.
Issues:
- Voices: We’re still stuck with the robotic “Alloy” TTS, not the more natural voices available in advanced mode.
- Turn Detection: It’s not as smooth as advanced mode; sometimes the AI responds while you’re still talking.
- Premature Responses: The API seems to generate tokens too quickly, jumping to respond before I’ve finished my sentence.
- Mishearing Issues: In one test, I was talking about Roman history, and the AI mistook “Caesar” for “Caesar salad” and went on a tangent.
- High Costs: It’s advertised at $0.24/minute, but I was charged $16.40 for 11 minutes, translating to $1.49 per minute, or an eye-watering $89.40 per hour, even with minimal interruptions.
- Cost Control: On the bright side, the system followed my system prompt—keeping responses to 2-3 sentences for better dialogue—but at $90/hour, it’s still too steep for regular use.
For now, the API isn’t quite ready for prime time, unless you’re just testing the waters.