I understand how each of top_k, top_p and temperature work individually, but I don’t know how each of them affects one another. some sources suggests and top_k and top_p can work in conjunction (How to generate text: using different decoding methods for language generation with Transformers), but what does that mean? For example, if top_k = 50, top_p =0.5 and the 50th token is not in the first 0.5 cumulative probability, is it possible for it to be sampled? If top_k = 50, top_p =0.5 and the 51th token is in the first 0.5 cumulative probability, is it possible for it to be sampled?
Furthermore, in regard to temperature, does the softmax operation happens before or after top_k = 50 and/or top_p =0.5 selection?
Top-k is not a setting exposed by OpenAI API, so it is not under consideration. The “k” is likely “all”, or very high, considering how crazy temperature = 2 can be.
2 Likes