How does `n` parameter work in chat completions?

You only “send” once, the only change to the API call is the value of the N number, and you are only charged for the extra tokens of the extra output cases.

While technical details aren’t made available, the most logical way is to load the input into the context of the model, and simply capture, wipe, and repeat the output generation that occurs at the generation start point in the context length area where the response is formed.

3 Likes