Assistants API works in playground , fails with incomplete when calling API

gary.leydon · September 20, 2024, 4:33pm

Hi, I’ve got a File Search assistant connected to a 3MB vector store that contains ~ 94 PDFs. I can use this assistant without any problems in the playground and when I run a typical query against it in the playground the number of tokens generated is ~ 16k in and ~500 out. Tier1 says it has a limit of 30,000 tokens/minute. When try to run a query against the assistant via the API it returns status: incomplete, and reason “Max Tokens”. It doesn’t matter what I set the max_tokens parameter to when creating the run. Any tips would be greatly appreciated. I assume it’s something dumb and crucial I’m missing.

_j · September 20, 2024, 5:02pm

You aren’t allowed to directly set the actual max_tokens that each AI run uses internally when the AI model is called iteratively.

Instead, there is more a supervisory token limit, that will abort a run if your budget is exceeded.

Find out what it is now: When you invoke a run at the values you are using, inspect the return object you immediately get back with metadata, including the token parameters that are being used as a cutoff point for that run (which I moved up to the top of this example return):


  "temperature": 1.0,
  "top_p": 1.0,
  "max_prompt_tokens": 1000,
  "max_completion_tokens": 1000,
  "truncation_strategy": {
    "type": "auto",
    "last_messages": null
  },
  "id": "run_abc123",
  "object": "thread.run",
  "created_at": 1698107661,
  "assistant_id": "asst_abc123",
  "thread_id": "thread_abc123",
  "status": "completed",
  "started_at": 1699073476,
  "expires_at": null,
  "cancelled_at": null,
  "failed_at": null,
  "completed_at": 1699073498,
  "last_error": null,
  "model": "gpt-4o",
  "instructions": null,
  "tools": [{"type": "file_search"}, {"type": "code_interpreter"}],
  "metadata": {},
  "incomplete_details": null,
  "usage": {
    "prompt_tokens": 123,
    "completion_tokens": 456,
    "total_tokens": 579
  },
  "response_format": "auto",
  "tool_choice": "auto",
  "parallel_tool_calls": true
}

These are the ones you can crank way up to prevent an abort:

  "max_prompt_tokens": 1000,
  "max_completion_tokens": 1000,

Max prompt and max completion will shut off the assistant if they are exceeded. They should be high enough that only an AI gone crazy would trigger, preventing the response from being returned.

Tier 1 has quite low token limits. Just two internal calls before a response that use that 15k of information from the vector store can overrun your limit - forcing you to prepay more than you’d use to OpenAI for functional assistants.

To counter that, you can use a new parameter for limiting how much is injected from the file search each time it is used besides lowering the chunks of truncation_strategy. A relevance threshold for keeping non-related results out.

gary.leydon · September 20, 2024, 8:10pm

Thanks for taking the time and your detailed response. I will try again with the insight you’ve provided. I really appreciate it.
Gary

Gary Leydon
Director of Educational Technology
Center for Medical Education
203-737-6408

jochenschultz · September 20, 2024, 8:30pm

Great post. Would rate it FAQ worthy.

Topic		Replies	Views
Assistant API run status incomplete with max_prompt_tokens when max_prompt_tokens is NOT set API assistants-api	3	993	June 26, 2024
Too many input tokens are used by Assistant Feedback assistants-api	2	147	November 20, 2024
Assistant API - way too much "input" tokens used API assistants-api , assistants-pricing	7	4565	September 6, 2024
Problem with Assistant - max length 32K even with the gpt-4-1106-preview and trying to get the status I get a passed three arguments error even though I only passed two API assistants	9	3141	December 16, 2023
Assistant API v2: max_prompt_tokens gets exceeded, barely, consistently Bugs	5	913	July 4, 2024

Assistants API works in playground , fails with incomplete when calling API

Related topics