I thought the “CONTEXT WINDOW” meant (input prompt token + output prompt token).
So I manually caculated the tokens for input before API Request, and truncated it some so that the output token doesn’t exceed the “CONTEXT WINDOW” for the model.
But when I see the documentation, the “output token” is specified separately for the recent new models.
So I guess the “CONTEXT WINDOW” is for maximum input tokens (128,000 tokens) and the maximum output tokens is 4096 for the new models as the document says.
If the “context window” doesn’t mean maximum ( input tokens + output tokens ) and it only means maximum input tokens, is there any info that I can check the maximum output tokens for other models?
I keep looking for it but I only see “context window” in the document.