Read GPT-2 source code.
I have.
Code self-documents how logits are selected.
Temperature increases the probability distance of normalized logits by dividing by a fraction (multiplying the reciprocal of temperature).
GPT-3 is just an incremental advance that followed shortly after, except for training on 200x the parameters. GPT-4 temperature obfuscated by the mixture of expert models that has been supposed.