Looped responses after 20~ messages

Hey folks, I’m encountering a frustrating issue with an app I’m building that integrates OpenAI’s Assistant API (using gpt-4o-mini), and I haven’t been able to find a solution.

App Setup:

  • Python Flask app running on Google Cloud Run (also tested locally with the same issue)
  • Uses the same assistant_id across sessions
  • Includes custom-built frontend, where each browser refresh starts a new thread
  • Chat doesn’t need to continue old conversations (no need for persistent memory)
  • Retry mechanism: 30 retries with 5-second intervals for each API call
  • Features used: file search, code interpreter, vision API

The Problem:

After ~20 messages in a thread (e.g., 10 user messages + 10 assistant responses), the assistant starts looping the exact same response, regardless of the input. For example:

  • Prompt: “What is the color of the sky?”
  • Response: A completely unrelated, repeated response from an earlier valid answer in the thread.

The assistant remains stuck in this loop no matter what the user inputs.

What I’ve Tried:

  1. Checked the thread in the Playground: The assistant receives my new prompts and responds correctly on OpenAI’s end.
  2. Built a simplified Flask app to rule out complexity in the main app. The issue persists.
  3. Cleared cache and cookies, restarted the server, and loaded the thread in an incognito browser session. The assistant works fine until the old thread is loaded—then it immediately gets stuck again.

What I Suspect:

Initially, I thought it was a token limit issue since some threads exceed 200-300k tokens when the loop starts. However, I’ve also tested with shorter, plain conversations, and the issue still occurs after around 20 messages.


Temporary Workaround:

I’m currently considering limiting the session to ~10 messages to avoid hitting this bug. However, I’d like to understand the root cause to avoid just patching the problem.


Questions:

  1. Has anyone encountered a similar issue?
  2. Could this be an API-side limitation or bug?
  3. Is there something I’m missing in managing session state?

Any insights or suggestions are highly appreciated.

Hi and welcome to the community!

As a quick and simple test to identify the source of the issue, you can visit the playground on platform.openai.com and try using your assistant there.

If the conversation exceeds the 20-message limit without encountering the same issue, it’s likely a problem with your code. Otherwise, please let us know.

Hope this helps!

Thanks will try that, but what I notice so far is that when via the API I receive same response, If I open the same thread in the Playground I see my prompts are received and the answers are correct not the looped one I get via the API.

So I assume (but will try) that if I use the Assistant via the Playground interface it will behave, as not using the API.

I mean, isn’t that already proof that there is something in your code that returns the wrong reply to the user, starting from message 20?

Yeah I thought the same that is why I tried with absolutely separate simple app to load this thread and it sends the same looped answer, tried with curl and the same. And this confuses me.

I would be happy if the problem is in the code and I can pay someone to fix it, just it feels like it’s not in the code.

Yes, I need to correct myself.
The correct answer is to adjust the limit in the request.

from the docs:
https://platform.openai.com/docs/api-reference/messages/listMessages

The default value is 20 and that’s what is being observed.

2 Likes

Also note that these are not parameters to place in the body of the request, they are part of the URL:

Query parameters

limit

A limit on the number of objects to be returned. Limit can range between 1 and 100, and the default is 20.

order -Defaults to desc

Sort order by the created_at timestamp of the objects. asc for ascending order and desc for descending order.


If you set the order, you can just retrieve one message and ensure it is “assistant”.

2 Likes

I am not entirely sure I understand your solution (the 20 limit for requesting a list of Assistants) and how that relates to your described problem about an Assistant thread that starts failing after more than 20 messages ?

1 Like

The API method described is for requesting a list of messages in a thread.

Were one to continue to retrieve the thread messages with default parameters, you could just be reading the assistant response at #20 and no further, despite adding more messages and running them.

You are correct. The link is now pointing to the list of messages.

Ah that makes sense. I never ran into it I guess because using default DESC and than [0] is your latest message. :slight_smile: