I’m trying to get a better understanding of “best of”. Although I suspect it can be helpful in some cases, I’m not sure whether this is actually true or not, and was hoping someone who understands it better can help explain.
The part I can’t get my head around is the difference between using BO to find the most probable of x iterations and just running the prompt with temperature set at zero. Is “temp = 0” not just another way of saying return the response with the highest log probability?
I did some experimenting with davinci using limericks, since they utilize structure while being notoriously difficult to compose. Here are my results. The three numbers before each completion are temp, top p, and BO. Some experiments were repeated more than once.
Prompt:
Limerick about limericks: The limerick packs laughs anatomical
Into space that is quite economical.
But the good ones I’ve seen
So seldom are clean
And the clean ones so seldom are comical.
Limerick about a couple: There was a young lady of Norway
Who hung by her toes in a doorway.
She said to her beau
Just look at me Joe,
I think I’ve discovered one more way.’
Limerick about food: My favorite food is baloney
Though some people think it is phony
Despite what they say
0 1 1
It’s still the best way
To make a sandwich, in my opinion!
0 1 2
It’s still the best way
To make a sandwich, in my opinion!
0 .5 1
It’s still the best way
To make a sandwich, don’t you agree?
.7 1 1
It’s still the best way
To start off each and every day!
It’s still good to eat
And I’ll continue to eat it with glee.
It’s still the best way
To make a sandwich, with or without mayo!
It tastes great any day
And I’ll eat it as long as I live
It tastes great any day
And I’ll eat it as long as I live
.7 1 5
It’s still yummy today
And I’ll still eat it with bolognaise.
It’s still yummy to me
And I’ll eat it every day if I want to!
.7 .5 3
It’s still the best way
To make a sandwich, don’t you agree?
It’s still the best way
To make a sandwich, don’t you agree?
It’s still the best way
To make a sandwich, in my opinion!
The 0 1 2 experiment was just a sanity check to make sure I wouldn’t see variation when temp was set to 0. After setting the BO to a rather high (and expensive) number 5, I thought it was interesting to see the word yummy show up twice, even though it didn’t show up in the .7 1 1 round. ( I do like the word “yummy” in this context, and I could probably be convinced it’s a “better” word, although the completion lines weren’t particular noteworthy).
Interestingly, dialing top_p down to .5 with a BO of 3 resulted in two iterations of the 0 .5 1 result, and one iteration of the 0 1 1 result. Can anyone help explain why this would be true?
Non-technical opinions and observations as to whether a non-zero temp and BO > 1 can generate better results than temp = 0 and BO = 1 are welcome!