Assistant API v2: max_prompt_tokens gets exceeded, barely, consistently

jochenstu · May 24, 2024, 3:40pm

Today, I have had a failed runs that end on incomplete with “max_prompt_tokens”, where the limit get’s exceeded, just barely, but consistently, independent of where I the limit (tried between 25000 to 28000 for max_prompt_tokens). See below. Any clues to what’s going on? It feels like there is a miscalculation of tokens on the OpenAI end?

{
“id”: “run_REDACTED”,
“object”: “thread.run”,
“created_at”: 1716564198,
“assistant_id”: “asst_REDACTED”,
“thread_id”: “thread_REDACTED”,
“status”: “incomplete”,
“started_at”: 1716564199,
“expires_at”: null,
“cancelled_at”: null,
“failed_at”: null,
“completed_at”: 1716564204,
“required_action”: null,
“last_error”: null,
“model”: “gpt-4o”,
“instructions”: “REDACTED”,
“tools”: [
{
“type”: “file_search”
}
],
“tool_resources”: {},
“metadata”: {},
“temperature”: 1.0,
“top_p”: 1.0,
“max_completion_tokens”: 3000,
“max_prompt_tokens”: 26000,
“truncation_strategy”: {
“type”: “auto”,
“last_messages”: null
},
“incomplete_details”: {
“reason”: “max_prompt_tokens”
},
“usage”: {
“prompt_tokens”: 25964,
“completion_tokens”: 88,
“total_tokens”: 26052
},
“response_format”: “auto”,
“tool_choice”: “auto”
}

_j · May 24, 2024, 3:46pm

The max prompt tokens isn’t providing information of how to operate. It is providing information of when to produce an error.

You provided no limit to the length of thread conversation with the limited ability of truncation_strategy. Then you have file_search enabled, where the AI will call a tool and get back up to 20 chunks by 800+overlap tokens. So the error you asked for blocks the run, leaving it incomplete.

jochenstu · May 24, 2024, 3:54pm

From the api docs on truncation strategy:

"The truncation strategy to use for the thread. The default is auto. […] When set to auto, messages in the middle of the thread will be dropped to fit the context length of the model, max_prompt_tokens. "

I have not set truncation strategy, so it defaults to auto and should honor max_prompt_tokens. Am I misunderstanding?

When I change max_prompt_tokens, the tokens used for the prompt do change with it. Only it misses, barely.

jochenstu · May 26, 2024, 9:49am

The problem persists. Any ideas are welcome.

_j · May 27, 2024, 7:15am

The max prompt tokens is only there to produce an error and block usage when the AI exceeds your setting.

If you want to actually control the length of past chat which can increase the usage, you must use truncation strategy.

OpenAI basically broke assistants by making the tier 1 rate limit per minute even lower than the amount an assistant with documents can use in a single run. I would suggest building on Chat Completions rather than rewarding with $50+ to raise tier this crude behavior by OpenAI that does not even allow someone to send 1/3 the model context. That gives you full control of the amount sent to a model every call.

jim · July 4, 2024, 11:37pm

I must have read the documentation a billion times and never realized that was the case. Thought for sure it was using that as a parameter, not as a means to strike an error.

Topic		Replies	Views
All my runs are "Status: incomplete" Bugs api , assistants-api	16	1486	November 7, 2024
Assistant API run status incomplete with max_prompt_tokens when max_prompt_tokens is NOT set API assistants-api	3	640	June 26, 2024
Error message with longer inputs Prompting	5	2426	September 5, 2024
Possible Assistants API v2 bug with max_prompt_tokens + tools API	5	1109	June 11, 2024
How does max_prompt_tokens work? API api , assistants-api	2	235	September 10, 2024

Assistant API v2: max_prompt_tokens gets exceeded, barely, consistently

Related topics