Assistant Keeps Running in Loop, Exceeding Expected Token Usage

Hello OpenAI Community,

I’ve encountered an issue with the Assistant API using the GPT-3.5 Turbo model where the assistant unexpectedly entered a loop during execution, significantly exceeding the expected token usage. Despite intending for a single execution based on a straightforward prompt, the run resulted in a total of 55878 input tokens being used, which is well beyond the typical limit for a single interaction.

  • Assistant Thread ID: thread_2HWNotG3WRWgIlKAwIyUivIQ
  • Assistant Run ID: run_VY6ZywFuR76Xu4nwXL9X1UtQ

Upon reviewing the run steps, it appears that the assistant kept executing in a loop until it reached the token limit for my account. This behavior was unexpected as the model was supposed to run only once and then stop, especially considering that our assistant was not configured to use any recursive function calls or similar mechanisms—it was a simple prompt.

Key Details:

  • The run used a total of 55878 input tokens.
  • There were no explicit recursive or looped function calls in our prompt or configuration.
  • The expected behavior was a single execution in response to the user’s prompt.

I’m looking for guidance on the following:

  1. How can we prevent the assistant from entering such loops, especially when the prompt and use case do not inherently require or suggest multiple executions?
  2. Is there a specific parameter or configuration option that can be passed to ensure the assistant runs only once and stops, avoiding unintended token consumption?

Any insights, suggestions, or guidance on mitigating this issue and understanding how to control token usage more effectively in similar scenarios would be greatly appreciated. Ensuring efficient and predictable token usage is crucial for us, and understanding the root cause of this loop behavior is essential.

Thank you for your assistance.


Assistants have their own functions injected.

gpt-3.5-turbo-0613 is not compatible with retrieval or parallel multi-tool.

The quality of the assistants framework is out of your control, where, for example, the AI might not understand the errors it already made, or is confused by having its full context injected with functions and irrelevant file knowledge.

AI is often loopy and repeat-y, and will find patterns and repeat them naturally. You have no parameters available to control such behavior with sampling, penalties, bias, etc. with assistants.

1 Like

Similar behaviour has been reported before. I hope it’s on their radar:

1 Like

Did you check out the thread in the backend? My guess would be the function call either falls or gives back unexpected results and then gets recalled. You should be able to see the details (including data and out) in the thread. Enable threads in Settings → Organization and then check out the thread in the main side menu.

To fix this isn’t possible to use the Max _Token and set a limit so it doesn’t exceed on the output?

That part of the statement is correct. It isn’t possible to use a max_token parameter to set a limit, as no limitation mechanism is provided.

The largest concern is input token consumption, because conversation history and documents that are added to input are unchecked and maximized.

+1 on this, getting this today - I get this both via API calls in my app and in OpenAI playground.