What is the difference between setting the parameter n and sending the same requests n times? e.g., if I set n=5, and get 5 choices, how does this differ from I send the request 5 times?
Will max_token impact the length of the output, besides cutting it when it exceeds the limit? e.g., if I set max_token to a small number versus a large number, will the response I get differ in the length significantly?
I think if you use the n parameter you only pay for the input tokens once. if you make 5 calls, you pay 5 times for both output and input. But I could be wrong, documentation on that is becoming spotty. The utility of that is pretty limited.
Nope. It will just cut it off. It has absolutely no bearing on the quality of the generation.
Thank you! I get the cost part of setting n. But will it impact the quality / similarity of the responses I get, assuming all the other parameters are the same?
Your first question has been answered and explored here:
This is helpful when you want to check if the model replies according to your expectations. One can send the same request 10- 10,000 times and evaluate the replies, assessing if the deviation from the required reply quality is 1% or 5%.