Hello OpenAI Community,
I’ve encountered an issue with the Assistant API using the GPT-3.5 Turbo model where the assistant unexpectedly entered a loop during execution, significantly exceeding the expected token usage. Despite intending for a single execution based on a straightforward prompt, the run resulted in a total of 55878 input tokens being used, which is well beyond the typical limit for a single interaction.
- Assistant Thread ID: thread_2HWNotG3WRWgIlKAwIyUivIQ
- Assistant Run ID: run_VY6ZywFuR76Xu4nwXL9X1UtQ
Upon reviewing the run steps, it appears that the assistant kept executing in a loop until it reached the token limit for my account. This behavior was unexpected as the model was supposed to run only once and then stop, especially considering that our assistant was not configured to use any recursive function calls or similar mechanisms—it was a simple prompt.
Key Details:
- The run used a total of 55878 input tokens.
- There were no explicit recursive or looped function calls in our prompt or configuration.
- The expected behavior was a single execution in response to the user’s prompt.
I’m looking for guidance on the following:
- How can we prevent the assistant from entering such loops, especially when the prompt and use case do not inherently require or suggest multiple executions?
- Is there a specific parameter or configuration option that can be passed to ensure the assistant runs only once and stops, avoiding unintended token consumption?
Any insights, suggestions, or guidance on mitigating this issue and understanding how to control token usage more effectively in similar scenarios would be greatly appreciated. Ensuring efficient and predictable token usage is crucial for us, and understanding the root cause of this loop behavior is essential.
Thank you for your assistance.