Context limit smaller than documented

benko.csaba · February 24, 2025, 3:50pm

The OpenAI API documentation states that the model “gpt-4o-mini-realtime-preview” has a context window of 128,000 tokens. However, if I give a system message consiting of 80,000 tokens, the AI gives the answer: “I’m unable to assist with that.”
Has anyone managed to give the Realtime model a context greater than that? Is there a way, to use it with the token limit stated in the documentation?

EDIT:
We have given the same input to a normal “gpt-4o-mini” model, which was able produce a correct answer. We have tested it with multiple datasets, and by each we experienced that over 80,000 tokens the normal “gpt-4o-mini” model provided a correct answer, while the “gpt-4o-mini-realtime-preview” model gave a response “I’m unable to assist with that.”. Under 80,000 tokens, both models provided a correct answer.

_j · February 24, 2025, 4:06pm

You obtained a refusal response from the AI model. It just doesn’t want to respond to your question and input.

Realtime also accepts session “instructions”, not a system message directly.

If the sent context length could not be loaded into the AI model, you’d instead get an API error instead of expensive bot talk. You can append your instructions 4 times and see the API error of trying too many tokens of input.

The voice models are particularly sensitive to inputs provided, and easily shut down and deny responses, from “you like to sing songs”, to “you give the boyfriend experience”. The large instruction input itself may simply be confusing enough if it isn’t surrounded with “permission-giving” text about the AI’s identity and purpose.

benko.csaba · February 26, 2025, 8:27am

Thank you for the answer!

However, we have given the same input to a normal “gpt-4o-mini” model, which was able produce a correct answer. We have tested it with multiple datasets, and by each we experienced that over 80,000 tokens the normal “gpt-4o-mini” model provided a correct answer, while the “gpt-4o-mini-realtime-preview” model gave a response “I’m unable to assist with that.”. Under 80,000 tokens, both models provided a correct answer.
Is it possible, that the model “gpt-4o-mini-realtime-preview” understands the same prompt different than the model “gpt-4o-mini”?

Topic		Replies	Views
Context Limitations in Real-Time API API realtime	1	516	April 28, 2025
Gpt-4-1106-preview: 400 This model's maximum context length is 4097 tokens API api , token , gpt-4-turbo	8	5438	March 18, 2024
Context length vs. Token Limits API gpt-4 , api	1	2026	July 11, 2024
Gpt-4-1106-preview 16385 max context tokens? (not output, total) API gpt-4	2	3096	December 12, 2023
Subject: Issue with Token Limit for `gpt-4o-mini` Model in `v1/chat/completions` API Documentation gpt-4	3	1524	September 3, 2024

Context limit smaller than documented

Related topics