Likelihood of a set of completions

nicholas.p.pezzotti · May 30, 2021, 11:03am

Is there a way to find the most likely completion from a set?

For example, I want to know: given a set of possible completions ({“eating pizza”, “going fishing”, “playing poker”}) and a prompt (“I like”), which is the most likely completion to follow the prompt.

As of now, I use the completions endpoint with echo=true and extract the logprobs of the final tokens corresponding to the possible completion (e.g. of “eating pizza”). I do this for each possible completion and select the one with the highest total logprob. Unfortunately, this method requires N API calls, where N is the number of possible completions, each of which takes |p| + |c| tokens from my balance, where |p| is the length in tokens of the prompt and |c| is the length in tokens of the completion. In my case, N and |p| are large, therefore this becomes infeasible at a larger scale.

Is there any better way of doing it?

Silver · May 31, 2021, 3:03am

Not really no.
Effective search for the most proper sequence is an unsolve issue.
You can sample the most likely token at every time step, but this will not gurantee the resultsing sequence is the most likely sequence

silverriver · May 31, 2021, 5:28am

“most likely” and “most proper” from the perspective of the logprob of the total sequence.

That is, if the generated sequence has a large logprob, then that means the GPT3 model “think” this sequence is more “likely” or “proper” to appear after the prompt.

Sampling the most likely token (i.e., the token with the largest logprob) at every time step may not necessarily yield the sequence with the largest logprob.

nicholas.p.pezzotti · May 31, 2021, 9:36am

Yes, I forgot to mention I also set the temperature to 0. And @Silver and @silverriver bring up good points but that is not what I meant.

The key difference in the scenario I describe is that I have the possible set of completions and I want to find the most likely of those completions. Not let GPT-3 complete it in whatever way it thinks it’s best. I am currently able to do it, but not efficiently.

silverriver · May 31, 2021, 2:32pm

If we can obtain the logpro of all the tokens (tokens in prompts + generated tokens), that will solve your problem

nicholas.p.pezzotti · June 1, 2021, 10:01pm

I am able to do that, by setting echo=True and logprobs=1.

It is very inefficient, though, for the reasons described above.

Topic		Replies	Views
Completion does not use highest probability token API	4	737	August 14, 2023
Do I need to increase `max_tokens` when using `n>1` e.g. `n=3` for generating multiple chat completions API	8	2065	July 2, 2023
Confidence score for prompt response API	11	13407	December 21, 2023
The Relationship between Best of, Temperature and Top P (The Three Variable Problem) Prompting	10	4682	April 14, 2023
Duplicate results in completion choices Prompting	9	4775	January 4, 2024

Likelihood of a set of completions

Related topics