Duplicate results in completion choices

james.kirk · November 18, 2021, 11:23pm

Hello!

I am trying to get multiple completions for a given prompt using n and best_of, however the choices I get back are all duplicates.

response = openai.Completion.create(
		engine="davinci", 
		prompt="A cup can be found in a", 
		max_tokens=5,
		top_p=1, 
		best_of=3,
		n=2, 
		temperature=0.0, 
		stop=[".", "\n"])
#response.choices print out		
[<OpenAIObject at 0x2007d4faae0> JSON: {
  "finish_reason": "length",
  "index": 0,
  "logprobs": null,
  "text": " kitchen in the first floor"
}, <OpenAIObject at 0x2007d4fab80> JSON: {
  "finish_reason": "length",
  "index": 1,
  "logprobs": null,
  "text": " kitchen in the first floor"
}]

If I set the temperature to 0.7, then I get different non-duplicate results, but I want the best results without randomness.

I would greatly appreciate any help.
Thanks!
-James

daveshapautomator · November 18, 2021, 11:33pm

With a temperature of 0.0 you will always get the same response.

james.kirk · November 18, 2021, 11:45pm

I see, so best_of only works with temperature > 0.0? I was able to get the top results (with temp 0.0) for a single token response using logprobs, I was hoping to do something similar for multi token responses using best_of.

I guess I could just get the top results (using toplogprobs) for a single token first, and then make a new API call for each one. I was hoping there was a way to do this with a single API call using best_of, but I guess not.

Thanks for the quick response!

gtcline182 · November 19, 2021, 4:51am

Another suggestion which may be ‘Rube Goldberging’ the problem: create a chat history of each session/response. (Similar to what the playground provides.) However, after you receive a new message, parse your chat history and see if the message is similar to previous messages. (Let’s just say the last 5 responses.) For a simple yet not necessarily robust solution; you could compare the amount of times each word was found in each message. Example:

Previous message:

“This is the result of your message”

Current response:

“How dare you question my result result result”

Take the length of both messages and the amount of duplicate words that were found.

In our case,

Length of Previous message = 7

Length of Current response = 8

Duplicate words found = 3 (result, result, result)

Use any threshold you desire to deem something a ‘bad’ message.

If you find 3 or more of the same words, delete the current response and query GPT3 again with the previous message. If this occurs more than once (like you get the exact same message back) delete current response and previous message and then send GPT3 your ‘previous previous message’. (You could also compare the total duplicate words to the total length of the response) again, use some threshold.

What you are mainly trying to avoid (in my opinion) is having duplicate data in a close sequence. GPT3 is trying to predict what comes next. If it sees the same message (or close to) it will start repeating itself. This gets worse per message because each response ‘encourages’ the ‘bad’ behavior.

If you want to solve this problem mathematically, use the Cosine Similarity. Just Google for the code and you can pretty much just copy and paste the function.

To support [daveshapautomator]’s response: if you have the temperature at 0, you are almost guaranteed to make GPT3 deterministic. If the model has to take a ‘bad’ guess, it will usually be a repeat of the most seen tokens/words it has seen recently.

james.kirk · November 19, 2021, 2:34pm

Hmmm I see, thanks for the warning.

james.kirk · November 19, 2021, 2:37pm

Thanks for the suggestion. I actually desire deterministic responses (that are short 3-4 words), but I want to get the top N results rather than just a single result with the highest probability, if that makes sense.

RecyclableBag · November 19, 2021, 10:24pm

Perhaps you will want to make the model more deterministic, not temp-0 deterministic.

If I were you and you are still trying to pull this functionality out of the api, I’d start with a small sample prompt, your best-of selection at a spot you’re comfortable with, and then begin tweeking outputs by only adjusting temp. Once you find a range with the right ratio of truthful-ish responses then expand your prompt to your use case.

Hope this helps.

james.kirk · November 22, 2021, 6:17pm

That is a good point. Thank you, I will try using a low temperature and experiment with different levels.

james.kirk · November 22, 2021, 6:19pm

Awesome thanks for the example! Very helpful.

Topic		Replies	Views
Observing discrepancy in completions with temperature = 0 API	9	17551	February 6, 2024
ChatGPT API - Avoiding Same Results API	4	2615	March 9, 2023
Why is GPT-4 giving different answers with same prompt & temperature=0? API	6	16538	April 6, 2023
The Relationship between Best of, Temperature and Top P (The Three Variable Problem) Prompting	10	4726	April 14, 2023
Is the completion API endpoint stateful in any way? API completions	4	1412	May 17, 2023

Duplicate results in completion choices

Related topics