Question about context window and truncation

Reading the official documentation on reasoning models, I found the following diagram:

(Source: https://platform.openai.com/docs/guides/reasoning?api-mode=responses#how-reasoning-works)

As far as I know, the context window in LLM works in such a way that when generating each next token, the last N tokens are taken, but no more than the window size. That is, the window seems to slide constantly during generation, taking only the last tokens within the limit. But then it turns out that the diagram should have truncated input at the beginning and not truncated output at the end:

Is this really an error in the first diagram or am I misunderstanding something?

1 Like