Duplicate results in completion choices

Hello!

I am trying to get multiple completions for a given prompt using n and best_of, however the choices I get back are all duplicates.

response = openai.Completion.create(
		engine="davinci", 
		prompt="A cup can be found in a", 
		max_tokens=5,
		top_p=1, 
		best_of=3,
		n=2, 
		temperature=0.0, 
		stop=[".", "\n"])
#response.choices print out		
[<OpenAIObject at 0x2007d4faae0> JSON: {
  "finish_reason": "length",
  "index": 0,
  "logprobs": null,
  "text": " kitchen in the first floor"
}, <OpenAIObject at 0x2007d4fab80> JSON: {
  "finish_reason": "length",
  "index": 1,
  "logprobs": null,
  "text": " kitchen in the first floor"
}]

If I set the temperature to 0.7, then I get different non-duplicate results, but I want the best results without randomness.

I would greatly appreciate any help.
Thanks!
-James

1 Like

With a temperature of 0.0 you will always get the same response.

5 Likes

I see, so best_of only works with temperature > 0.0? I was able to get the top results (with temp 0.0) for a single token response using logprobs, I was hoping to do something similar for multi token responses using best_of.

I guess I could just get the top results (using toplogprobs) for a single token first, and then make a new API call for each one. I was hoping there was a way to do this with a single API call using best_of, but I guess not.

Thanks for the quick response!

Another suggestion which may be ‘Rube Goldberging’ the problem: create a chat history of each session/response. (Similar to what the playground provides.) However, after you receive a new message, parse your chat history and see if the message is similar to previous messages. (Let’s just say the last 5 responses.) For a simple yet not necessarily robust solution; you could compare the amount of times each word was found in each message. Example:

Previous message:

“This is the result of your message”

Current response:

“How dare you question my result result result”

Take the length of both messages and the amount of duplicate words that were found.

In our case,

Length of Previous message = 7

Length of Current response = 8

Duplicate words found = 3 (result, result, result)

Use any threshold you desire to deem something a ‘bad’ message.

If you find 3 or more of the same words, delete the current response and query GPT3 again with the previous message. If this occurs more than once (like you get the exact same message back) delete current response and previous message and then send GPT3 your ‘previous previous message’. (You could also compare the total duplicate words to the total length of the response) again, use some threshold.

What you are mainly trying to avoid (in my opinion) is having duplicate data in a close sequence. GPT3 is trying to predict what comes next. If it sees the same message (or close to) it will start repeating itself. This gets worse per message because each response ‘encourages’ the ‘bad’ behavior.

If you want to solve this problem mathematically, use the Cosine Similarity. Just Google for the code and you can pretty much just copy and paste the function.

To support [daveshapautomator]’s response: if you have the temperature at 0, you are almost guaranteed to make GPT3 deterministic. If the model has to take a ‘bad’ guess, it will usually be a repeat of the most seen tokens/words it has seen recently.

2 Likes

Hmmm I see, thanks for the warning.

1 Like

Thanks for the suggestion. I actually desire deterministic responses (that are short 3-4 words), but I want to get the top N results rather than just a single result with the highest probability, if that makes sense.

Perhaps you will want to make the model more deterministic, not temp-0 deterministic.

If I were you and you are still trying to pull this functionality out of the api, I’d start with a small sample prompt, your best-of selection at a spot you’re comfortable with, and then begin tweeking outputs by only adjusting temp. Once you find a range with the right ratio of truthful-ish responses then expand your prompt to your use case.

Hope this helps.

2 Likes

That is a good point. Thank you, I will try using a low temperature and experiment with different levels.

Awesome thanks for the example! Very helpful.

1 Like