I cannot understand why I can only set max_tokens to 4095 when the documentations says that gpt-4o and many of the other models have much larger context windows?
Welcome to the dev forum.
There’s a difference between input and output. If you’re lucky, you can get 4095 out on applicable models.
Hope this helps.
But should´nt I be able to have a context window of 128k tokens. If I do the exact same query inside chatgpt it works, but not with the api. It stops because of max-tokens.
What model are you using?
Are you getting an error? Just not as much content?
What are you trying to accomplish?
If you do the exact same query in ChatGPT, you are getting max_tokens
of 1536 or 2048.
That you are satisfied shows you don’t need to set it so high.
We can guess the output limit was set on new models so that platform costs are reduced, or safety is increased in case the AI goes bonkers and wants to write $4.00 of nonsense output – or because the response simply devolves at that length.
You can simply omit this parameter and get the maximum available after sending your input.