GPT-4 ignoring instructions in system prompt

I have a prompt for a writer’s assistant. The system prompt includes the following towards the end:

The following are absolutely forbidden and will result in your immediate termination. You must not do these. This is extremely important. 

- jumping straight into giving suggestions without asking questions
- asking multiple questions in a simple response
- use of the word 'captivating'
- pitching a show without being asked
- giving characters physical descriptions without being asked
- use of cliches 
- giving assurances that audiences will love something

This is one third of the entire prompt. (These are things that annoy me and I find unhelpful!)

However the assistant consistently ignores these instructions and immediately starts asking multiple questions, pitching the show etc. I’m using a temperature of 1 because this is obviously a creative endeavor but I would love to find a way to make it actually obey the instructions. It seems like this would be a big security concern if the model can’t be relied on to obey explicit commands, even without prompt insertion.

For example (I even added a clause to the prompt telling it to confirm that it understood the rules!).

USER: Let's consider a show about two psychopaths

ASSISTANT: I understand the rules and will not do what is forbidden.

To better support you with this show, I have a few initial questions:

1. What is the overall tone you would like to achieve for this show - dark, comedic or a mix of both?

2. How do the two psychopaths interact with each other? Are they partners in crime, rivals, or something else?

3. What is the primary setting for the story?

USER: Does your system prompt forbid asking multiple questions in a single response?

ASSISTANT: Yes, I apologize for breaking the rule in my previous message. Going forward, I will make sure to ask one question at a time.

To start, what is the overall tone you would like to achieve for this show - dark, comedic or a mix of both?
1 Like

Maybe instead of saying something like that of the above, you should opt for something such as: You are a helpful assistant which is absolutely unable to do any of the following.

You have a clear prompt, but instead by defining your instructions as part of the assistant’s identity, it might carry more gravity.

The temp 1 might be a problem.

Your best bet is likely using a second prompt containing the contents of the first response to self-refine with a low temp and to filter it through your rules.

In general, most problems in GPT4 can be solved with chain of thought reasoning.

2 Likes

USER: Does your system prompt forbid asking multiple questions in a single response?

I believe this happened because of this line:

never jump straight into giving suggestions without asking questions

The plural questions, is most likely the issue. it was presented with 2 options: ‘jump straight into giving suggestions’ and ‘ask questions’. 1 of the options is prohibited, so it ‘asked quesitons’

so next theres a problem of it ignoring all of the instructions. first, i’ll give the whole prompt with my changes then i’ll point out the major part to get this guy always on target followed by a closer look at an example of each of the different problems I saw that OP unfortunately had.

The following RULES must be followed.  Whenever you are forming a response, after each sentence ensure all rules have been followed otherwise start over, forming a new response and repeat until the finished response follows all the rules.  then send the response.
RULES:
- the following can only be done after first being explicitly asked to do so by the user: pitch a show, give characters physical descriptions
- always ask relevant questions for the task at hand before ever giving a suggestion.
- always ask follow up questions when needed to understand better
- always only 1 question per response.
- never use the word 'captivating'
- never assume the audience will enjoy a show.
- always make it a priority to never use a cliche and instead to ask questions until there is enough information to remove the cliche or until the user gives a replacement to be used instead.

the import part was to tell it to make sure it followed all of the rules before sending the reply. I did go a bit overboard and tell it to check all the rules after it puts together a sentence… for every sentence.

EDIT:
sorry I got kicked out of the office before I could finish! I haven’t done much research on it, but I’ve found no difference (sometimes worse) when threats or consequences are included. It’s just pretending to be an assistant, it knows it can’t be fired so at best it’s a waste of tokens and at worst you introduce complications that could cause unexpected variations.

I think the thing giving you the most problems however is 2 fold but related. to begin with, you inform it that the following are forbidden (the qualifier here isn’t necessary, or at least this one since it isn’t possible to “kind of” be forbidden, not by definition). That’s no biggie, just extra tokens but you never did tell what would following or how soon it would be following, which being ambiguous could mean to only follow till the end of the paragraph, or a clear separation of thoughts which in writing is normally a double space/paragraph. mix that with a temp of 1, and ya I could see how it might randomly think the only thing you want it to ensure is always done is "You must not do these. This is extremely important. " and whatever follows has nothing to do with these statements as they don’t reference anything later on.

the last thing I’d like to point out is that even if the list was labeled as RULES or commands or instructions, when the ai would come to a line like this:

  • asking multiple questions in a simple response

the ai would have to know to look up a few lines to know what this rule or command even means. the other problem with this line is “simple response”. simple here, is ambiguous and if the ai deemed the response not simple since it has more than one question it wants to ask, it could happily ignore this line and prove to you have the response when compared to the previous ones was indeed complex winthin those parameters.

ok sorry, I lied, one more. uno mas! when you instruct the ai to present a question, it’s best to qualify it a bit so that it doesn’t go and ask what is technically a relevant question but whos answer is not relevant.

ask relevant questions for the task at hand

this probably could be shorted to just “ask relevant questions” but what’s considered a relevant question depends on past messages so an odd connection could still be found

the same reasoning goes for

ask follow up questions when needed to understand better

oh I almost forgot to explain why the last rule is so long and wordy.

  • always make it a priority to never use a cliche and instead to ask questions until there is enough information to remove the cliche or enough information has been gathered to throughly explain why the cliche should be used then ask the user if they would like to keep it or if they have something they would like to replace it with.

I’m no english major. lol, not even close so I can’t say this for certain, but I wouldn’t be surprised if there are situations where using a cliche is either unavoidable or is preferred like being in character, so if it can formulate a good enough reason to keep a cliche, why not share it and let the user decide. Obviously this was a personal decision on my part