Likelihood of a set of completions

Is there a way to find the most likely completion from a set?

For example, I want to know: given a set of possible completions ({“eating pizza”, “going fishing”, “playing poker”}) and a prompt (“I like”), which is the most likely completion to follow the prompt.

As of now, I use the completions endpoint with echo=true and extract the logprobs of the final tokens corresponding to the possible completion (e.g. of “eating pizza”). I do this for each possible completion and select the one with the highest total logprob. Unfortunately, this method requires N API calls, where N is the number of possible completions, each of which takes |p| + |c| tokens from my balance, where |p| is the length in tokens of the prompt and |c| is the length in tokens of the completion. In my case, N and |p| are large, therefore this becomes infeasible at a larger scale.

Is there any better way of doing it?

Not really no.
Effective search for the most proper sequence is an unsolve issue.
You can sample the most likely token at every time step, but this will not gurantee the resultsing sequence is the most likely sequence


“most likely” and “most proper” from the perspective of the logprob of the total sequence.

That is, if the generated sequence has a large logprob, then that means the GPT3 model “think” this sequence is more “likely” or “proper” to appear after the prompt.

Sampling the most likely token (i.e., the token with the largest logprob) at every time step may not necessarily yield the sequence with the largest logprob.

Yes, I forgot to mention I also set the temperature to 0. And @Silver and @silverriver bring up good points but that is not what I meant.

The key difference in the scenario I describe is that I have the possible set of completions and I want to find the most likely of those completions. Not let GPT-3 complete it in whatever way it thinks it’s best. I am currently able to do it, but not efficiently.

1 Like

If we can obtain the logpro of all the tokens (tokens in prompts + generated tokens), that will solve your problem

I am able to do that, by setting echo=True and logprobs=1.

It is very inefficient, though, for the reasons described above.