Openai web search token limit issue

fwpoppel · March 24, 2025, 2:47pm

I have the following API call:

response = openai_client.responses.create(
model=‘gpt-4o-mini’,
input=‘Your input prompt here’,
tools=[
{
“type”: “web_search_preview”,
“search_context_size”: “high”
}
]
)

and it mostly works but on occasions I get the following (error) response:

{“id”: “resp_67e158973b208191bc42b115727c0aa20e1a648aff9c28ee”, “created_at”: 1742821527.0, “error”: null, “incomplete_details”: {“reason”: “max_output_tokens”}, “instructions”: null, “metadata”: {}, “model”: “gpt-4o-mini-2024-07-18”, “object”: “response”, “output”: [{“id”: “ws_67e15897c464819194bcb16b1a31cbdb0e1a648aff9c28ee”, “status”: “completed”, “type”: “web_search_call”}], “parallel_tool_calls”: true, “temperature”: 1.0, “tool_choice”: “auto”, “tools”: [{“type”: “web_search_preview”, “search_context_size”: “high”, “user_location”: {“type”: “approximate”, “city”: null, “country”: “US”, “region”: null, “timezone”: null}}], “top_p”: 1.0, “max_output_tokens”: null, “previous_response_id”: null, “reasoning”: {“effort”: null, “generate_summary”: null}, “status”: “incomplete”, “text”: {“format”: {“type”: “text”}}, “truncation”: “auto”, “usage”: {“input_tokens”: 372, “input_tokens_details”: {“cached_tokens”: 0}, “output_tokens”: 16384, “output_tokens_details”: {“reasoning_tokens”: 0}, “total_tokens”: 16756}, “user”: null, “_request_id”: “req_1905918e6fdb561e37fcc310e6cbe5b4”}

It seems to be an error with the number of output tokens but there is no way to limit or control it. How should I resolve this or is this a bug?

_j · March 24, 2025, 3:01pm

You are not asking for a JSON with structured output, so the AI filling a JSON with garbage and not closing it before the max_output_tokens is reached is not the issue.

However, what you are asking for is for an internal tool to be used. Once the AI has switched to sending to a tool recipient, the generated language is not for you. You can’t see that when writing the function arguments, the AI has gone crazy, filling the search query with nonsense up to the maximum length.

Mini model, high search results of distraction, unrestrained top_p: formula for bad sequences and repetitive patterns to be entered.

The Responses endpoint gives no frequency_penalty parameter that could break repetition.

fwpoppel · March 24, 2025, 3:33pm

Ok, so what should I tell it to do then? A simple: “Give me a summary of what you find on the internet with the following query…”?

_j · March 24, 2025, 3:57pm

You can make the model more “reliable”: use "top_p": 0.5 as a parameter. That eliminates low-certainty tokens generated from being sampled from.

The problem may be that the tool is specified internally using json mode or strict enforcement, and once it enters, and starts emitting tabs or newlines as it is known to do, you get an unbounded pattern.

They also seem to have an internal tool call iterator, otherwise the AI would not be able to click on links, so there are multiple ways that the AI could exceed the max_output_tokens, which is the maximum amount that you want to spend. Such as gpt-4o-mini being observed to call developer tools over and over without end.

fwpoppel · March 25, 2025, 6:48am

I set the temperature to 0 and also was more specific with my prompt and now it does not seem to use so many tokens. Thanks.

Topic		Replies	Views
API token limitation differs from website UI token limitation API	4	620	December 18, 2023
Assistant API v2: max_prompt_tokens gets exceeded, barely, consistently Bugs	5	948	July 4, 2024
Why is gpt-3.5-turbo-1106 max_tokens limited to 4096? API	3	14028	January 11, 2024
Response being cut off in Azure OpenAI API	6	2205	January 30, 2024
Web Search Completion Cuts Off Response and ignores structured outputs on complex prompts API api , structured-output	6	323	April 30, 2025

Openai web search token limit issue

Related topics