Assistant API Max input context size

ethanmorann · November 7, 2023, 12:59am

Will it be possible with the assistant API to specify a max input context size?

With the automatic truncating of messages to fit the max context size, this implies that after the message length is long enough, every input to the gpt model will be at or right below the maximum context size.

I worry that if we can’t specify something smaller than the max context size, this will end up costing a lot per interaction with the assistant.

fooevr · November 7, 2023, 10:57am

I also have the same question. As the infrastructure of the production environment, we need to have a clear expectation on the cost.

ncmateus · November 23, 2023, 2:19am

Same question here, the way it’s right now it looks like a playground for testing and them implementing your own infra using chat completion.

stevecoffey · January 29, 2024, 6:34pm

Hey all! Steve from the OpenAI dev team here. We’re working on designing usage controls for thread runs in the assistants API, and I want to provide a preview of the proposed change and get your feedback.

What we’re proposing is to add two new parameters to endpoints that create a run:

POST /v1/threads/{thread_id}/runs
POST /v1/threads/runs

we would add an optional field, token_control to the payload that looks like this:

{
  ...
  token_control: {
   	max_run_prompt_tokens: int;
	max_run_completion_tokens?: int;
  }
}

The idea is to internally limit the number of tokens used on each step of the run and make a best effort to keep overall token usage within the limits specified.

Let us know what you think of this idea and whether it will work for your use cases!

dominic1 · February 22, 2024, 1:27am

Those parameters would be wildly helpful! Do you have an estimated date for implementation?

This would alleviate all concerns around using GPT-4 with the Assistants API.

beyhangl · April 16, 2024, 10:31am

Hi @stevecoffey Steve, This sounds like a promising addition for better managing token usage in the assistants API.

Could you share the current implementation status for this feature in the API? Knowing how far along this development is and any anticipated timelines for testing or release would help plan integration efforts on our end.

Topic		Replies	Views
How to limit the number of messages or tokens that are persisted in a thread to maintain context in Open AI Assistants? API api , assistants-api , assistants-pricing	24	9225	January 29, 2024
OpenAI Assistant maximum token per Thread API gpt-4-turbo	11	10580	May 28, 2024
How to limit input tokens of assistant? API	6	4169	April 4, 2024
Assistant API: How to set limit of context window for GPT-4 turbo API gpt-4 , api , assistants , assistants-api , assistants-pricing	7	6129	April 15, 2024
Timeline for Assistants API Features? API assistants-api	3	1257	February 10, 2024

Assistant API Max input context size

Related topics