Feels a bit off to post a technical question given what’s just happened but here we are.
As of today, I keep getting an error when using my assistants api that no more than 32768 characters are supported in a single message/rquest body in a thread. The specific error shown is as follows and it occurs both in the app and in the playground.
1 validation error for Request body → content ensure this value has at most 32768 characters (type=value_error.any_str.max_length; limit_value=32768)
This seems rather odd as GPT-4-turbo has a much larger context window and would be a significant constraint in the use of assistants api.
Does anyone have any insights into this? Thanks!
There’s an error with the API itself where transmission of 32768 characters is seen as a limit. That would seem to make direct interactions to obtain “128k” impossible by chatcompletion. (edit, the chat API works fine, this comes from “assistants”)
One would think this is overcome by server-side resources such as uploaded files or calling upon threads (server-side chat history) with just a new user question, but finding an API limitation and lifting it only for those who want to lose control of their spending seems not just disingenuous but malicious promotion.
Like rate limits, this could just be one of those “oh, we should have increased that spec too” things that will be lifted, but needing a new API spec.
Wait, does that mean that the 120k tokens announced is… fake? What other way do we have to use the 120k token limitation with their API then?
It just feels like we spent a few hours building an entire use-case around their API to finally discover the specs are not the one announced?
I think it is just a temporary constraint as part of the assistant’s beta phase. 128k tokens work fine otherwise for GPT-4 turbo
Does the tokens from the retrival step, which are added to the prompt, count towards the 128k token count?
Yes, everything the AI must know in order to provide a final answer must be placed into the AI context length.
I wouldn’t call it “count toward”, but rather just part of the input to the AI model that goes along with instructions, function definitions, chat history, current messages, past results of functions like code interpreter, and the undocumented methods used internally to fill up the AI context (as full as possible) with uploaded and attached files.
This topic’s report of a character limit encountered is just in placing the messages yourself, while the assistants backend has lots of other internal ways to make sure the model context is filled to the brim.
It is more informative to work directly with the models, where you are the one making these decisions in your code about how much chat is necessary for “memory”, and how external knowledge will be added, or how features external to the model will be called with internal turns.
Yea - so it sounds like when using the assistants’ api, the chosen model’s context window can be decomposed to:
- Token length of the input prompt
- Token length of retrieved context
- Token length of the output
And if so, perhaps the 32768 char limit for the input is one stop-gap to limit #1 so that there are enough tokens for #2 and #3.
So it’s not that the 120k token limit is fake : )