Difference between frequency and presence penalties?

TimC · May 19, 2021, 7:22am

I’m still a little confused about the difference between Frequency Penalty and Presence Penalty.

Is this a scaling thing, where presence penalty is a flat reduction if the token has appeared at least once before, while frequency penalty is bigger if the token has appeared multiple times?

Also, is there an easy way to implement a consecutive token penalty, that scales on the number of identical tokens in a row?

For instance, if I previously had “We the people of the United States” somewhere in the document, and the most recent 5 words are “We the people of the”, it should penalize ‘United’, and if it does select ‘United’, then the next token selection will even more harshly penalize ‘States’, and so on.

That seems like it would be more directly aimed at the issue I’m seeing a lot of (repeated long sequences of words), but it’s possible I’m just misunderstanding how Frequency and Presence work.

Thanks!

joey · May 19, 2021, 8:16am

Yes, that’s exactly correct!

That level of granularity isn’t really possible, but I’d try iteratively increasing the penalty value (e.g. +0.1 at a time) to see how it impacts repetition.

ericks.network · May 2, 2023, 2:58pm

Frequency_penalty and presence_penalty are two parameters that can be used when generating text with language models, such as GPT-3.

Frequency_penalty: This parameter is used to discourage the model from repeating the same words or phrases too frequently within the generated text. It is a value that is added to the log-probability of a token each time it occurs in the generated text. A higher frequency_penalty value will result in the model being more conservative in its use of repeated tokens.
Presence_penalty: This parameter is used to encourage the model to include a diverse range of tokens in the generated text. It is a value that is subtracted from the log-probability of a token each time it is generated. A higher presence_penalty value will result in the model being more likely to generate tokens that have not yet been included in the generated text.

Both of these parameters can be adjusted to influence the overall quality and diversity of the generated text. The optimal values for these parameters may vary depending on the specific use case and desired output.

sd2383 · September 25, 2023, 10:17pm

Where is this from? Would be helpful if the documentation included more details like this but I don’t see it anywhere.

curt.kennedy · September 25, 2023, 10:29pm

It’s in the docs, for example:

Topic		Replies	Views
Frequency & Presence Penalty Values Meaning API api	1	5211	November 16, 2023
Does frequency penalty punishes punctuation? API	6	1995	October 26, 2021
Frequency and presence penalties and function words API	2	1557	August 28, 2023
The impact of "presence_penalty" and "frequency_penalty" on "n" (many completions)? API	5	5369	November 25, 2022
Presence_penalty and frequency_penalty parameters API	2	7519	September 24, 2024

Difference between frequency and presence penalties?

Related topics