Feels a bit off to post a technical question given what’s just happened but here we are.
As of today, I keep getting an error when using my assistants api that no more than 32768 characters are supported in a single message/rquest body in a thread. The specific error shown is as follows and it occurs both in the app and in the playground.
1 validation error for Request body → content ensure this value has at most 32768 characters (type=value_error.any_str.max_length; limit_value=32768)
This seems rather odd as GPT-4-turbo has a much larger context window and would be a significant constraint in the use of assistants api.
There’s an error with the API itself where transmission of 32768 characters is seen as a limit. That would seem to make direct interactions to obtain “128k” impossible by chatcompletion. (edit, the chat API works fine, this comes from “assistants”)
One would think this is overcome by server-side resources such as uploaded files or calling upon threads (server-side chat history) with just a new user question, but finding an API limitation and lifting it only for those who want to lose control of their spending seems not just disingenuous but malicious promotion.
Like rate limits, this could just be one of those “oh, we should have increased that spec too” things that will be lifted, but needing a new API spec.
Yes, everything the AI must know in order to provide a final answer must be placed into the AI context length.
I wouldn’t call it “count toward”, but rather just part of the input to the AI model that goes along with instructions, function definitions, chat history, current messages, past results of functions like code interpreter, and the undocumented methods used internally to fill up the AI context (as full as possible) with uploaded and attached files.
This topic’s report of a character limit encountered is just in placing the messages yourself, while the assistants backend has lots of other internal ways to make sure the model context is filled to the brim.
It is more informative to work directly with the models, where you are the one making these decisions in your code about how much chat is necessary for “memory”, and how external knowledge will be added, or how features external to the model will be called with internal turns.