Is there any difference between generating completions one chunk at a time vs. all at once?

ldorigo · October 21, 2021, 11:44am

I don’t fully understand transformer/generative models, so I’m wondering about the following: is there a difference between generating few tokens at a time several times in a row (without changing anything to the output) vs. generating a large chunk at once? In other words, does Codex only predict one token ahead at a time and does so iteratively (in which case there should be no difference), or is there some advantage to having a larger amount of tokens at once?

asabet · October 21, 2021, 1:41pm

With multiple calls you have k * in_tokens + sum_out_tokens , while with one call you only have in_tokens + sum_out_tokens .

Topic		Replies	Views
Question about completion lengths. example, two 50 token completions essentially equivalent to one 100 token completion in terms of how the model receives it as a prompt? API	1	348	October 25, 2021
Mutiple text generation ( batch text generation ) single api call API	0	505	August 30, 2021
Feature Request: step-by-step completion API	0	358	August 14, 2022
Does output token limit increase by using stream=true? API api , chat-completion , token , limitations	2	1092	August 20, 2023
How are openai consumers generating large responses even when there is token limit on models? API chatgpt , chat-completion	2	463	February 14, 2024

Is there any difference between generating completions one chunk at a time vs. all at once?

Related Topics