Best choice for separator?

I’ve been using ### as my separator. I know that there are separators that are not as good (a single # gives obviously worse results), but besides that, I don’t know how much effort has been spent on trying to fine-tune separator selection (or come up with ones that work just as well, but with fewer tokens).

Has anyone experimented with different separator sequences? If so, what were your results?

3 Likes

What kind of separator do you mean? Like a vertical separator or list or what? I usually just use double newline

I have seen """ being used more and more in the examples, but also would like to know if this is just by chance or if someone had a closer look at their performance

2 Likes

I generally use ###.

# and ### are both only 1 token, so you’re not saving any tokens by using a shorter (and possibly worse) separator. You can test this in the tokenizer here.

Instead, take a look at the instruct series to use fewer examples or fine-tuning to not have to pass in examples at all.

With good prompt design, you might save hundreds of tokens per call, while increasing quality, as opposed to something minor like removing a separator (which could also decrease quality).

4 Likes

From the beginning I’ve been using """ in the Playground setting.
But right now, as I’m preparing a fine-tuned model, I decided for ###, because I’m a bit worried about JSON and JSONL data formatting. Don’t want no random bugs because of the stop sequence :slight_smile:

2 Likes

I actually use a single # because ### and ## also fails at times on high temperatures, and ends up one short. In a production environment, it looks rather weird. I’d rather it give two outputs.

I avoid things like ‘’’, $$$, “”" because they’re easier to misinterpret.

Something like ⁙⁙⁙ appears to work just fine and won’t conflict with any data.

1 Like