According to the docs, gpt-4o has a context window of 128.000. i put “max_tokens=64000” as parameter. why am i getting the error “openai.BadRequestError: Error code: 400 - {‘error’: {‘message’: ‘max_tokens is too large: 64000. This model supports at most 4096 completion tokens, whereas you provided 64000.’, ‘type’: None, ‘param’: ‘max_tokens’, ‘code’: None}}”? where does the number 4096 come from? i didnt find other related posts to be very helpful
I think you might be confused by what the max_tokens parameter actually does.
It isn’t used to set the context window of the model, it’s used to set the limit to how many tokens the model will output at one time.
The limit to that is 4096. I believe that’s the case for just about every model out there right now, not just OpenAI’s.
There is no parameter to adjust the context window of the model itself. That’s always set at 128k and it’s up to you to manage that on your own, unless you use assistants.
To add some colour here, the idea is that a long context model is good at processing 100k tokens in its inputs, but only generates about 4k output tokens at a time.
For example, long context models can read a short story (of say 50k tokens), and answer a question about the short story (< 4k tokens).
Although I’m sure this will change in the long-run, right now 100k output tokens will probably devolve into chaos and there are fewer use cases for this.
Do assistants provide longer context window than 128k tokens or do they perform some sort of pre-processing of larger files in the backend and “reduce” uploaded files and instructions to <= 128k tokens? I’m confused because the official docs still says that model gpt-4o has 128k context window. Am i missing anything where they officially talk about longer context windows?
Hi, the context window of the underlying model is 128k, RAG pipelines and other methods used by microsofts AI search system are used to choose which data is included in the prompt up to that maximum 128k.
I see --yes that makes sense. Thanks!