I’m using the Assistants API and have done the following:
- Created an assistant, which carries my prompt
- Code will create a thread
- A message will be added to the thread with the content from the user
- Code will run the assistant on that thread
- Code checks the run’s status
- When code senses the run is complete, will take the assistant’s response
This is all done via API calls
The weird thing I’m noticing is that when the run is kicked off on the thread, it creates a new run step every 3 seconds, repeatedly (same run, many steps). This continues until, I suspect, it eats up all of my TPM limit. I end up pulling up all the messages in a thread and see the same content in ~20 messages, even though it had the right answer after the first run step!
Why does this happen? I’m contemplating sending a cancel message as soon as I am able to retrieve one “completed” response which might fix my issue, but I’m curious what’s causing this.
The answer is there’s something wrong. The best reason I can give is by shaking a magic 8-ball.
Can you give us some details? Logs, preferably? Code? What is your prompt? Instructions? Tools used? Anything?
This is like shutting off your water supply because water continuously ejects from your toilet after each flushing
Hi and welcome @apotheonlabs
This is likely an issue with the implementation. But can’t say for sure until the code is presented.
This is likely a problem with the model or the biases in the API. When tools are present, as they always are when using any functions internal or provided, the latest models were modified/damaged in the last week to emit functions, even if there is no answer to be provided by the function description. This behavior can especially be triggered by multiple questions, calling the multi-tool wrapper.
The only present workaround is to use a model such as gpt-3.5-turbo-0613 that is not trained to use parallel tool calls and doesn’t have this issue.