_j
3
Sampling is used to randomly select from the set of tokens, biased by the certainty.
Selecting only the top choice is called “greedy sampling” and actually doesn’t produce language that seems as human, despite being more reliable.
How tokens are sampled can be affected by API parameters.
or