How does GPT increase context window from 1k to 4k to 32k?

Hi All,

I have a technical question, how are the context window of GPT models increased ?
I want to know wha differentiates gpt-3.5-turbo with 4k context window from gpt-3.5-turbo with 16k context window.
is it different model altogether ? training methodology ? different positional embedding ? or any other change in model.

how was the model able to process more context ? it would be great to have an explanation and paper citations if available.
it would be great if someone can shine some light on this.

Thanks