Question about context window and truncation

WhiteRabbit · May 14, 2025, 4:00pm

Reading the official documentation on reasoning models, I found the following diagram:

(Source: https://platform.openai.com/docs/guides/reasoning?api-mode=responses#how-reasoning-works)

As far as I know, the context window in LLM works in such a way that when generating each next token, the last N tokens are taken, but no more than the window size. That is, the window seems to slide constantly during generation, taking only the last tokens within the limit. But then it turns out that the diagram should have truncated input at the beginning and not truncated output at the end:

Is this really an error in the first diagram or am I misunderstanding something?

Topic		Replies	Views
What does "CONTEXT WINDOW" mean in the documentation API api	3	13550	December 20, 2023
GPT-4o context window confusion API gpt-4 , api , error , gpt-4o	5	52303	August 4, 2024
Context window size for the babbage-002 model Documentation question , fine-tuning , api	3	347	June 15, 2024
How does Assistant API (ChatGPT System) handles long context (aggregation of prompts & responses) in a Thread? Prompting chatgpt , llm , llm-output , long-context	0	597	April 20, 2024
Error about token length, max token Feedback	1	74	March 25, 2025

Question about context window and truncation

Related topics