Does frequency penalty punishes punctuation?

I have the impression that if you set Frequency Penalty high it will penalize punctuation so later parts of your text become less and less likely to be properly punctuated.

is there a way to exclude punctuation tokens from being penalized?

4 Likes

Yes, Frequency penalty tends to be very degenerative for longer texts. It punishes punctuation, common prepositions, etc.

No way to exclude them, however you can try setting logit_bias higher for those punctuation tokens, as a patch.

3 Likes

I’ve noticed this too. A lot of my completions end in word salad run-ons. I like to keep frequency penalty high to keep things fresh. Sometimes it’ll start thesaurusizing (to coin a term) and spit out strings of terms, but then sometimes it comes out of these tailspins and starts producing more coherent phrases.

Boris, I’m wondering if you might shed some light on how Temperature and Frequency Penalty work together. It seems they’re sort of doing the same thing, but I don’t yet have a handle on the key differences between the two.

1 Like

temperature just increases the chance that not the most likely token is picked. You could request one completion, then hand pick a deviation at any point, by selecting the second or third most likely token.

frequency penalty more heavily penalizes every consecutive occurrence of the same token.

These models might sometimes be a bit too comfortable with just repeating a phrase, so making a few less likely tokens can be helpful to get the model outside the comfortable territory, and then following the most likely path is fine. Whereas if you keep picking the third choice, eventually it’s possible to start generating nonsense.

7 Likes

Utterly fascinating. Thanks, Boris, for that explanation.

Weirdly, this relates to a writing workshop exercise I used to conduct. I would ask my students to complete the sentence “A woman walks into a detective agency and says…” After they finished, I’d ask for a show of hands of how many had used the word “husband” or “missing” in their completions. Invariably multiple hands went up.

The point of the exercise was to point out tropes and get people to start picking their second and third options instead of reaching for the most obvious thing that came to mind. I thought of it as a creativity training that encouraged my students to not feel satisfied with tropes and develop more unique work.

I’m going to play with temp and frequency and see if I can ramp them up in the early parts of a story then pull them back down in the later part of a story. Conclusions to me are all about closing loops and call backs.

Super helpful information. Thanks a million.

5 Likes

Hummm, does this apply to stop words as well? That could be a problem. Our use case doesn’t generate long texts so not a problem for us. I guess this could be easily confirmed wih some corpus linguistics on long outputs.

Well, I would guess that if you made the stop sequence more than one token long, then you run the risk of only matching part of the stop sequence and now part of the token sequence for the stop sequence has been penalized, and thus you would most likely encounter longer completions than desired.

However, it’s only a greater chance of not producing the stop sequence after that happens, so depending on how well you trained the model and how it reacts with input, you may not find this a problem.