Responses seems to get stuck in a loop with vector stores

I have a fairly lengthy system message and output json_schema and when I start a responses conversation it works great until the model (gpt-4o-mini) decides it needs to search for something. If it’s not found it just slightly re-words the search criteria over and over again until it runs out of something (context window, tokens or some other limit). It also takes what feels like forever to run through that loop, which means the user experience will suck.

This managed to eat through 2 million tokens with a few user messages that were only around 16-20 words. Without file search the same system message and user message consumes 2,000 tokens, when it gets stuck in a loop it consumes around 360,000 tokens.

Any ideas what could be causing the looping, or if there is a way to let it just ingest the content and use that “knowledge” rather than search the files.

1 Like