Punctuation spacing issue

Hey there,

Really enjoying the API - and thanks for all the tips in the forum!

I’m having some issues with punctuation including a space before them when the length of the content is over a few hundred words/tokens.

As you can see the response starts out fine, but gradually begins to add extra spaces before each punctuation mark.

This screenshot is the playground, but I’m getting the same error on both playground and API.

Model is text-davinci-003.

Has anyone else seen this? Is this a known issue or a new bug?

I see the same problem. And it gets gradually worse if you push for longer outputs. I think it might be their attempts to distort as part of watermarking. (They don’t have watermarking yet, not officially, but that does not prevent them from testing.)

What I did is to simply run search/replace for common such issues.

If these issues are indeed a bug, it might still be a good idea to search/replace to combat future watermarking. Google includes such techniques i Google Translate.

Sounds like your frequency_penalty is high. What’s it set at?

Wow… that’s a GREAT insight. I understand that it might not apply to punctuation, but it certainly might be a possibility.

According to at least one source, white spaces are not including in the frequency penalty because the frequency penalty is only applied to tokens not white space.

Maybe it’s a bug in the OpenAI code?

Do you have a reference @PaulBellow which states the frequency penalty effects white space?

When I checked, this is what I found, but I have not confirmed with another source:

The frequency penalty is applied to the model’s predictions during training to discourage it from generating generic or repetitive responses, but it doesn’t affect the model’s ability to process or generate white spaces.

Without a lot of sources, I asked silly old ChatGPT which we all know has difficulty with technical facts so take it with a tablespoon of salt, haha, but here goes:

Prompt

does the openai frequency penalty effect white space?

ChatGPT

No, the OpenAI frequency penalty does not affect white spaces. The frequency penalty is a feature that penalizes the likelihood of generating highly frequent tokens in the training data to discourage the model from producing repetitive or generic outputs. The frequency penalty is applied to individual tokens, such as words or subwords, and not to white spaces, which are used to separate tokens and are not considered as tokens themselves.

That is why I asked you @PaulBellow, if you have a reference for this idea?

Just personal experience over the last 3 or 4 years… When freq_penalty is too high, it seems like the model tries to compensate by being creative with words/spelling/spaces… since it can’t repeat something it thinks it should use, it sometimes misspells or changes it slightly so it’s not technically the same…

Again, just personal experience. I don’t have notes/code/examples/screenshots, unfortunately. However, I have seen numerous cases where adjusting the frequency_penalty helps… which is why I offered it here…

Wondering now if my sarcasm sensor didn’t go off correctly on this one?

Here’s hoping the person comes back to let us know if they figure it out or not…

ETA: Could also be a simple space or two in the prompt where it shouldn’t be that the model is copying…

1 Like

I truly meant that your comment was helpful. I was absolutely not sarcastic. =) Punctuations being part of the frequencies was not something that I had considered until that point.

Having tried various settings, I would say that frequency is in fact a factor. - I would say it’s a bigger factor the smaller the topic it. It seems to be often accompanied with excessively long sentences.

1 Like

Good to hear you got it sorted. :slight_smile:

Good luck!

2 Likes

This is an old thread but as of April 2024 reducing the frequency penalty has definitely helped get rid of a lot of these formatting issues. Including, spaces before periods/commas, /n instead of space, missing spaces and everything else that made text look funky.