The current real-time API has a context window of 128,000 tokens.
Model: gpt-4o-realtime-preview-2024-12-17
However, the API often fails to handle input tokens fewer than 10,000 for function calling.
Initially, I suspected this was due to the conversation history tokens, but even with a completely fresh call, I receive the response: “I’m sorry, I couldn’t process this.”
Has anyone found an efficient way to handle this? Are there any strategies for preserving longer context while staying within the API limits?
I’m following Twilio’s official example but have implemented function calling:
https://github.com/twilio-samples/speech-assistant-openai-realtime-api-python