Gpt-3.5-turbo-0613 Sometimes adds a period when explicitly stating it should not

Here’s my prompt: “Read the statements below and respond with a phrase for the overarching category – and only the overarching category – that encapsulates the main theme discussed. Ensure the category is general and neutral. For example, if the statements are about exercise routines and mental well-being, respond with ‘Health and Wellness’. Do NOT terminate the response with end-of-sentence punctuation such as a period or question mark.
[INSERT STATEMENTS HERE]”

The “insert” is the data for the prompt. Typically, it will be anywhere from 2 to 100 statements that I am looking to summarize according to the instructions. While it works overall, the last sentence of the instructions is not always followed. I’ve tried wording it many different ways. I know I could simply check for a period at the end and remove it if it is there – and that is what I do. But if there is something I can learn from this that would help my prompt writing in general, that would be great. Any assistance would be appreciated.

Do you by chance have a sample output we could see?

It might be better at producing phrases or strings without punctuation as a list, but I’d need to see its response to your prompt to better understand how I could help. Is this for an app or some other project with the API? I’ll be honest my mind naturally wants to gravitate towards post-processing, where I’d just snip off the punctuation with a line of code, but that’s just me.

From what I see here, I would recommend trying to coax it into producing the phrases as a list, specifying the phrases within the list should have no punctuation.

1 Like

A couple of notes,

  1. Try the gpt-3.5-turbo-instruct model, you may have better luck.
  2. Try including a one-shot example or few-shot examples in your prompt.
  3. Always remember that language models have historically struggled with “negative instructions,” e.g. never do X, don't use Y, etc. It’s like telling someone to not think of an elephant, you’re putting something you don’t want into context and (often) paradoxically increasing the likelihood of the model including it in the generation.
  4. Pick your battles. You can waste hours of your life fighting with the model with no guarantee you’ll actually be able to beat it into submission. Maybe with a meticulously crafted system message you reduce the occurrence rate from one-in-ten to one-in-one-thousand, but you can’t ever guarantee it won’t include a trailing punctuation, and if the punctuation breaks things downstream you need to be 100% confident it’s not there.

So, with all that in mind I’m going to strongly suggest you not bog yourself down in something which is trivially easy to fix with a single line of code[1].

Assuming the response in question is a string variable named x,

import regex as re
x = re.sub(r'[\p{P}]+$', '', x)

  1. Plus a package import if you aren’t already using the package regex. ↩︎

2 Likes

Thanks for your reply. As I indicated in my initial post, I did in fact implement a workaround and was asking primarily for the purpose of learning something. I didn’t know about the negative instruction challenge, so that is good to know. Thanks.

1 Like

Thanks for you response Macha. The result that is returned is a single phrase, not a list of phrases. If I submit several statements that talk about the quality of food in the cafeteria, the response might be “Quality of Cafeteria Offering”. Or it might be “Quality of Cafeteria Offering.” A little thing I was hoping to gain some understanding of that might contribute to my skills overall.

1 Like

Gotcha.

Elmstedt’s notes are really good here. I think another helpful takeaway here too might be understanding that all AI responses are gonna be a bit “fuzzy”, meaning they’re gonna be a bit different and occasionally imperfect each time. These models are great at working with raw text and helping folks with such tasks. Being able to control things like the final punctuation mark with 100% accuracy and granularity each time is a bit harder to pull off. It does have capabilities for some basic formatting rules and procedures, but in general, if you wish to continuously enhance your understanding and skillset it’s best to think of the LLM as one part of a process. It’ll handle the natural language aspects of whatever projects really well, but things like formatting and punctuation may be better suited for a different step (as we expressed with the regex thing).

The key is in working with its flexibility and fuzziness. Figuring out how to refine its responses through prompting conversations and use its strengths with natural language while regular code handles the more strict aspects of a program or logic is a really good thought process to truly understand the benefits of these models and how to use them.

I removed everything after ‘Health and Wellness’ and did not get any periods…

“Read the statements below and respond with a phrase for the overarching category – and only the overarching category – that encapsulates the main theme discussed. Ensure the category is general and neutral. For example, if the statements are about exercise routines and mental well-being, respond with ‘Health and Wellness’”

Hi ether. Thanks. Usually I will get a response in line with yours. It happens occasionally, I’m guessing 10% of the time at most. I added the instruction to exclude end-of-sentence punctuation, because sometimes it was, and I can’t say for sure that it actually made a difference.

I have found sometimes requesting something can complicate things. One of the other things I do is provide more examples, raw examples, in the prompt, so it seems like observable context, and that could help retain the norm…

Thanks, I think I’ll try including three examples and see what happens.