Hey folks, I’m encountering a frustrating issue with an app I’m building that integrates OpenAI’s Assistant API (using gpt-4o-mini), and I haven’t been able to find a solution.
App Setup:
Python Flask app running on Google Cloud Run (also tested locally with the same issue)
Uses the same assistant_id across sessions
Includes custom-built frontend, where each browser refresh starts a new thread
Chat doesn’t need to continue old conversations (no need for persistent memory)
Retry mechanism: 30 retries with 5-second intervals for each API call
Features used: file search, code interpreter, vision API
The Problem:
After ~20 messages in a thread (e.g., 10 user messages + 10 assistant responses), the assistant starts looping the exact same response, regardless of the input. For example:
Prompt: “What is the color of the sky?”
Response: A completely unrelated, repeated response from an earlier valid answer in the thread.
The assistant remains stuck in this loop no matter what the user inputs.
What I’ve Tried:
Checked the thread in the Playground: The assistant receives my new prompts and responds correctly on OpenAI’s end.
Built a simplified Flask app to rule out complexity in the main app. The issue persists.
Cleared cache and cookies, restarted the server, and loaded the thread in an incognito browser session. The assistant works fine until the old thread is loaded—then it immediately gets stuck again.
What I Suspect:
Initially, I thought it was a token limit issue since some threads exceed 200-300k tokens when the loop starts. However, I’ve also tested with shorter, plain conversations, and the issue still occurs after around 20 messages.
Temporary Workaround:
I’m currently considering limiting the session to ~10 messages to avoid hitting this bug. However, I’d like to understand the root cause to avoid just patching the problem.
Questions:
Has anyone encountered a similar issue?
Could this be an API-side limitation or bug?
Is there something I’m missing in managing session state?
Any insights or suggestions are highly appreciated.
As a quick and simple test to identify the source of the issue, you can visit the playground on platform.openai.com and try using your assistant there.
If the conversation exceeds the 20-message limit without encountering the same issue, it’s likely a problem with your code. Otherwise, please let us know.
Thanks will try that, but what I notice so far is that when via the API I receive same response, If I open the same thread in the Playground I see my prompts are received and the answers are correct not the looped one I get via the API.
So I assume (but will try) that if I use the Assistant via the Playground interface it will behave, as not using the API.
Yeah I thought the same that is why I tried with absolutely separate simple app to load this thread and it sends the same looped answer, tried with curl and the same. And this confuses me.
I would be happy if the problem is in the code and I can pay someone to fix it, just it feels like it’s not in the code.
I am not entirely sure I understand your solution (the 20 limit for requesting a list of Assistants) and how that relates to your described problem about an Assistant thread that starts failing after more than 20 messages ?
The API method described is for requesting a list of messages in a thread.
Were one to continue to retrieve the thread messages with default parameters, you could just be reading the assistant response at #20 and no further, despite adding more messages and running them.