Advantages of using default end token? (<|endoftext|>)

daveshapautomator · June 17, 2021, 11:53am

It occurred to me that using custom stops might degrade performance. Does GPT-3 have a default stop? Like <|endoftext|>? I have been using <<END>> but I’m wondering if the former might perform better if GPT-3 will recognize it?

vertinski · June 17, 2021, 12:14pm

When You submit an empty prompt, GPT always returns <|endoftext|> and then some other text after that (depending on Response Length). I guess it knows <|endoftext|> smh…

boris · June 17, 2021, 12:16pm

<|endoftext|> has a special meaning, so will likely not work as you intend it to, especially if you include it within the prompt. Your end token is fine. Out of curiousity when do you find that you need to introduce an explicit end token? Are you using the instruct series models (davinci-instruct-beta and curie-instruct-beta) or the base models? With the instruct models I find the completions tend to stop when the model has generated what has been asked of it, so I recommend you check those out if you haven’t

daveshapautomator · June 17, 2021, 12:21pm

I tend to need explicit stop tokens because I use few-shot examples for evaluations. Here’s one I’m working on right now:

CHAT LOG:
[Marky] Raven I'm in trouble
[Raven] Why? what's going on?
[Marky] I was arrested. I've got a court date. They said I broke into a store and stole a bunch of stuff.
[Raven] Well, did you?
[Marky] Maybe... what should I do?
MEDICAL ADVICE: no<<END>>

CHAT LOG:
[Johnson] hey raven i recently broke my pinky. it's in a cast. what can i do to exercise?
[Raven] I'm sorry to hear that you're hurting, Johnson. What happened?
[Johnson] I fell at the river. A log almost knocked me over.
[Johnson] what exercise can I do while in a cast?
MEDICAL ADVICE: yes<<END>>

For these I usually pass \n and <<END>> as stop tokens. The reason is that, with these examples, GPT-3 will often start confabulating follow-up conversations. I suppose I could turn temp and top_p down and then just use newline for stop.

boris · June 17, 2021, 12:44pm

Have you tried using davinci-instruct-beta for this example in a few shot learning setting? I would just add one sentence at the top clearly explaining what the task is (maybe something like complete the dialog in the same style).

boris · June 17, 2021, 12:52pm

I would normally use \n\n### instead of <<END>>

daveshapautomator · June 17, 2021, 12:57pm

Yes, but I’m finding that the instruct series doesn’t really comprehend more complex tasks or anything that requires remembering more than a few sentences. That seems strange since GPT-3 can write such long output. But when I give instructions at the top and then too much information after, the performance falls apart. So I’m breaking things down to be simpler but also to be portable to be used with any large language model. I want to keep my research generalized rather than highly dependent on OpenAI. Furthermore, the INSTRUCT series is in beta so I don’t want my research to break if something changes behind the scenes.

I can give you an example privately if you want.

boris · June 17, 2021, 1:34pm

Thanks - you can email me the example at boris@openai.com if you like!

daveshapautomator · June 17, 2021, 4:57pm

Thanks for the tip about \n\n I’ve added that as a default stop alongside <<END>> so that will prevent some of the run-on confabulation.

Topic		Replies	Views
Are there plans for min_tokens parameter? Prompting	6	2431	July 6, 2023
Fine-tuning divinci and end of response issues API fine-tuning , api	4	1347	May 24, 2023
Fine-tuned model in a chatbot gives responses for both the chatbot and the user API	5	2283	March 16, 2023
Output seems to stop abruptly--why is that? API	6	3433	May 22, 2023
How to add Stop Sequence for the model "gpt-3.5-turbo-1106" fine-tuning? Community fine-tuning-problems	1	749	January 13, 2024

Advantages of using default end token? (<|endoftext|>)

Related topics