It occurred to me that using custom stops might degrade performance. Does GPT-3 have a default stop? Like <|endoftext|>
? I have been using <<END>>
but I’m wondering if the former might perform better if GPT-3 will recognize it?
When You submit an empty prompt, GPT always returns <|endoftext|> and then some other text after that (depending on Response Length). I guess it knows <|endoftext|> smh…
<|endoftext|> has a special meaning, so will likely not work as you intend it to, especially if you include it within the prompt. Your end token is fine. Out of curiousity when do you find that you need to introduce an explicit end token? Are you using the instruct series models (davinci-instruct-beta and curie-instruct-beta) or the base models? With the instruct models I find the completions tend to stop when the model has generated what has been asked of it, so I recommend you check those out if you haven’t
I tend to need explicit stop tokens because I use few-shot examples for evaluations. Here’s one I’m working on right now:
CHAT LOG:
[Marky] Raven I'm in trouble
[Raven] Why? what's going on?
[Marky] I was arrested. I've got a court date. They said I broke into a store and stole a bunch of stuff.
[Raven] Well, did you?
[Marky] Maybe... what should I do?
MEDICAL ADVICE: no<<END>>
CHAT LOG:
[Johnson] hey raven i recently broke my pinky. it's in a cast. what can i do to exercise?
[Raven] I'm sorry to hear that you're hurting, Johnson. What happened?
[Johnson] I fell at the river. A log almost knocked me over.
[Johnson] what exercise can I do while in a cast?
MEDICAL ADVICE: yes<<END>>
For these I usually pass \n
and <<END>>
as stop tokens. The reason is that, with these examples, GPT-3 will often start confabulating follow-up conversations. I suppose I could turn temp and top_p down and then just use newline for stop.
Have you tried using davinci-instruct-beta for this example in a few shot learning setting? I would just add one sentence at the top clearly explaining what the task is (maybe something like complete the dialog in the same style).
I would normally use \n\n###
instead of <<END>>
Yes, but I’m finding that the instruct series doesn’t really comprehend more complex tasks or anything that requires remembering more than a few sentences. That seems strange since GPT-3 can write such long output. But when I give instructions at the top and then too much information after, the performance falls apart. So I’m breaking things down to be simpler but also to be portable to be used with any large language model. I want to keep my research generalized rather than highly dependent on OpenAI. Furthermore, the INSTRUCT series is in beta so I don’t want my research to break if something changes behind the scenes.
I can give you an example privately if you want.
Thanks - you can email me the example at boris@openai.com if you like!
Thanks for the tip about \n\n
I’ve added that as a default stop alongside <<END>>
so that will prevent some of the run-on confabulation.