N responses vs multiple generations

mark_humphries · December 9, 2023, 3:19am

Can anyone tell me how n responses generates two different responses to the same prompt? Is it the same generation process as you would get just running the query twice? I understand there are token savings, but does it also favour diverse responses? I ask because I want to compare responses to the same prompt for a validation task. I have tried just doing multiple generations at different temperatures too but I like the token savings on the input prompt with n responses, I am just worried that n responses will lack diversity in comparison.

TonyAIChamp · December 9, 2023, 3:27am

It is an interesting point about token savings. But does it really save tokens somehow?

To your question - as I understand, it is the same as running the same prompt N times.

mark_humphries · December 9, 2023, 3:37am

It saves tokens on input as you are only charged the input tokens once. So if your prompt is 2k it can add up.

But I guess I wonder if the seed or other internal parameters are the same for each of the n responses.

TonyAIChamp · December 9, 2023, 3:43am

Cool, never though of that, thank you

_j · December 9, 2023, 3:55am

The only thing that might affect the output of N>1 vs multiple is the unknown source of non-determinism in GPT-3.5+ models. Like you might get an instance that has more GPU calculation errors that are unchecked.

GPT-3 like text-davinci-003 does not suffer this symptom, and the only source of random is that which is intentional in sampling.

mark_humphries · December 9, 2023, 1:18pm

Cool thanks. I thought that. But I assume that in that case both n responses would “suffer” the same issue

moonlockwood · December 9, 2023, 2:37pm

If I understand correctly: The raw response from an llm is probabilities of certain tokens, n responses lets you see a larger sampling of these probabilities (instead of just one like we are used to)

_j · December 9, 2023, 3:24pm

They are separate response generations. Consider two trials that start this way:

Sure
Sorry

The path of inference will follow two completely different directions. Each token is generated based on all that came before.

Untitled

(Actual logprob for the curious)

Untitled

gyanveda · February 28, 2024, 7:55pm

@mark_humphries curious about the same exact thing. What did you end up doing and why? If you ended up doing n>1 in the same input prompt, did the responses give you the variability you were looking for?

Topic		Replies	Views
Questions on setting n and max_token API	4	970	March 20, 2024
How to generate DIFFERENT responses? API	6	6060	February 15, 2023
Why does the answer vary for the same question asked multiple times Community api	8	2630	May 22, 2024
Multiple prompt responses everywhere API	6	3676	December 25, 2023
Run same query many times - different results API	11	8209	December 21, 2023

N responses vs multiple generations

Related topics