Will it be possible with the assistant API to specify a max input context size?
With the automatic truncating of messages to fit the max context size, this implies that after the message length is long enough, every input to the gpt model will be at or right below the maximum context size.
I worry that if we can’t specify something smaller than the max context size, this will end up costing a lot per interaction with the assistant.
Hey all! Steve from the OpenAI dev team here. We’re working on designing usage controls for thread runs in the assistants API, and I want to provide a preview of the proposed change and get your feedback.
What we’re proposing is to add two new parameters to endpoints that create a run:
POST /v1/threads/{thread_id}/runs
POST /v1/threads/runs
we would add an optional field, token_control to the payload that looks like this:
The idea is to internally limit the number of tokens used on each step of the run and make a best effort to keep overall token usage within the limits specified.
Let us know what you think of this idea and whether it will work for your use cases!
Hi @stevecoffey Steve, This sounds like a promising addition for better managing token usage in the assistants API.
Could you share the current implementation status for this feature in the API? Knowing how far along this development is and any anticipated timelines for testing or release would help plan integration efforts on our end.