I don’t fully understand transformer/generative models, so I’m wondering about the following: is there a difference between generating few tokens at a time several times in a row (without changing anything to the output) vs. generating a large chunk at once? In other words, does Codex only predict one token ahead at a time and does so iteratively (in which case there should be no difference), or is there some advantage to having a larger amount of tokens at once?
With multiple calls you have
k * in_tokens + sum_out_tokens , while with one call you only have
in_tokens + sum_out_tokens .