Responses seems to get stuck in a loop with vector stores

grantc · March 13, 2025, 3:46pm

I have a fairly lengthy system message and output json_schema and when I start a responses conversation it works great until the model (gpt-4o-mini) decides it needs to search for something. If it’s not found it just slightly re-words the search criteria over and over again until it runs out of something (context window, tokens or some other limit). It also takes what feels like forever to run through that loop, which means the user experience will suck.

This managed to eat through 2 million tokens with a few user messages that were only around 16-20 words. Without file search the same system message and user message consumes 2,000 tokens, when it gets stuck in a loop it consumes around 360,000 tokens.

Any ideas what could be causing the looping, or if there is a way to let it just ingest the content and use that “knowledge” rather than search the files.

rahman.abdirashov · March 17, 2025, 11:49am

I had the same problem and did not notice that this bug is only on gpt-4o-mini. Thanks for mentioning that.

grantc · March 23, 2025, 9:30pm

I’m not sure if the issue is present in any other models, I’ve only tried it with gpt-4o-mini, however it is still there. I’m unclear why it loops around and then dumps out with no response, but if I find a root cause or workaround I’ll post it here.

rahman.abdirashov · April 8, 2025, 8:39am

It works now in the OpenAI playground. The issue seems fixed.

grantc · April 8, 2025, 11:25am

From what I can see it’s still an issue with gpt-4o-mini. What I’ve noticed is that if it can’t find relevant content in the vectorstore when it uses the file_search tool it loops around performing file_search after file_search until it runs out of context window. Then just stops without providing any output. gpt-4o works fine, it just does a single search then returns a message that basically ignores the fact that it didn’t find anything - this is the behaviour I would expect.

fedmenet · April 27, 2025, 5:48pm

I am observing the same behavior, did you ever find a solution?

grantc · April 27, 2025, 7:56pm

No, not really. The -mini variant still does it. I tried all sorts but always had the same issue. Gpt-4o works fine with exactly the same prompt and vector store.

fedmenet · April 28, 2025, 8:43am

I see, any luck with gpt-4.1-mini? It seems to fix the issue in my case, but not 100% sure yet.

NeutraleNull · April 28, 2025, 10:57am

I encountered the same problem. 4.1 and 4.1-nano seem to work fine but 4.1-mini fails almost every time.

joao · October 14, 2025, 11:38am

Is this still happening or has it been fixed?

_j · October 14, 2025, 11:46am

Are you just taking a survey, or do you have a similar concern with your own API use?

Mini models are particularly bad at tool calling, and can go into a loop, employing the internal iterator and searching again and again, even the same query, simply because the pattern looks like the cool thing for “assistants” to be writing from observing the prior use.

You can cap the amount of calls the AI is allowed to make itself before responding with the Responses API parameter max_tool_calls.

joao · October 14, 2025, 8:46pm

I have a similar concern with my own api usage. Thank you for the explanation!

Topic		Replies	Views
Openai web search token limit issue Bugs	4	574	March 25, 2025
Gpt4o-mini retrieval doesn't work as expected? API assistants-api , vector-store , gpt-4o-mini	9	1229	February 18, 2025
Assistant Keeps Running in Loop, Exceeding Expected Token Usage Feedback assistants-api	7	1715	July 5, 2024
API to gpt-3.5-turbo-16k getting stuck in a loop until it reaches max tokens API	7	1339	November 21, 2023
GPT-4.1 Responses API stalls once convo → ~350 k tokens (started 11 Jul 2025) — anyone else? API api , long-context , responses , gpt41	0	260	July 11, 2025

Responses seems to get stuck in a loop with vector stores

Related topics