I’m using the assistant API, and I noticed that the run status becomes incomplete with incomplete_reason: max_prompt_tokens when I didn’t set max_prompt_tokens. Looking at my run obj, max_prompt_tokens doesn’t even have a default value, so I wonder what the root cause is.
you are using gpt-3.5 and overwhelming it with returns from manually-specified vector stores, same with gpt-4-0613 that might not have proper context management in the 125k tuned assistants environment.
Other tools are iterating until the context is filled.
They could have added awareness of rate limits, so instead of failing because of 30000 token/minute rate at tier 1 to gpt-4o, you get this message instead.
You can obtain the run steps and look at how many internal iterations were done and the increasing token counts.
See if you are tier-1 (by having paid OpenAI less than the required $50 in multiple payments).
Observe then the token rate limit for the model of 30000 tokens per second. Reduced ten-fold a month ago.
Attempting to send more tokens than that in a single request or per minute will result in an API error.
Assistants is unaware of your tier, maintains and re-sending long conversations to the model which can accept 128k, and makes multiple iterative calls to AI models without waiting, some of which are sending tens of thousands of tokens returned by a file search on documents. A single user question can fail. An entire thread can be rendered non-functional.