Is there any difference between generating completions one chunk at a time vs. all at once?

I don’t fully understand transformer/generative models, so I’m wondering about the following: is there a difference between generating few tokens at a time several times in a row (without changing anything to the output) vs. generating a large chunk at once? In other words, does Codex only predict one token ahead at a time and does so iteratively (in which case there should be no difference), or is there some advantage to having a larger amount of tokens at once?

1 Like

With multiple calls you have k * in_tokens + sum_out_tokens , while with one call you only have in_tokens + sum_out_tokens .

1 Like