What does "CONTEXT WINDOW" mean in the documentation

I thought the “CONTEXT WINDOW” meant (input prompt token + output prompt token).

So I manually caculated the tokens for input before API Request, and truncated it some so that the output token doesn’t exceed the “CONTEXT WINDOW” for the model.

But when I see the documentation, the “output token” is specified separately for the recent new models.

So I guess the “CONTEXT WINDOW” is for maximum input tokens (128,000 tokens) and the maximum output tokens is 4096 for the new models as the document says.

If the “context window” doesn’t mean maximum ( input tokens + output tokens ) and it only means maximum input tokens, is there any info that I can check the maximum output tokens for other models?

I keep looking for it but I only see “context window” in the document.

1 Like

The context window includes all input, output, and control tokens.

1 Like

Oh, thanks.
Then the maximum (input + control) tokens for gpt-4-1106-preview would be
128000 - 4096 = 123904
tokens as I understand.
Is this correct?

Give or take.

Each message also has some miscellaneous tokens associated with the start and end of the message and who the message is from and whatnot.

It’s on the order of 4 tokens per message.

Typically, you’re going to have a bad time when you start trying to edge right up to the line on the total context.

Worst case scenario, assume you have room for about 123k tokens for input and you should be fine.

You can see it here,

1 Like