I’m using the playground to test out an assistant. I have it hooked up to a vector store with some JSON files. It seems whenever I send a message to the assistant, the tokens (both input and output) almost always sum up to 19,000 - 20,000. This in incredibly consistent.
Test 1: 19,738
Test 2: 19,129
Test 3: 19,357
Test 4: 19,572
And so on and so forth. Is there a 20k max token limit for input/output tokens?
My input tokens are usually 18,000+ and my output tokens are usually under 1,000.
I don’t see any mention of a total max token for these models. Is this a playground setting? The logs don’t show a max token count being set anywhere.
This happens for gpt-4o and the experimental model gpt-4o-2024-08-06 (I was hoping to take advantage of increase output tokens).
See the below for a likely explanation of what you are experiencing.
Essentially, this is related to the way the file search is configured, i.e. it operates under a token budget that defaults to 16k tokens for gpt-4o.
For a given search it will likely exhaust this budget. Add to that your system instructions - which too are consistent in size - and the variable size of a user message you are sending and you should have an explanation as to why you see relatively stable input token figures.
Your output tokens heavily depend on the design of your instructions and user prompt. While 1k is below the 4k max tokens, it falls within the normal range of output tokens. You’d have to re-engineer your instructions and prompt to obtain significantly higher output tokens.