GPT-4 ignoring instructions in system prompt

I have a prompt for a writer’s assistant. The system prompt includes the following towards the end:

The following are absolutely forbidden and will result in your immediate termination. You must not do these. This is extremely important. 

- jumping straight into giving suggestions without asking questions
- asking multiple questions in a simple response
- use of the word 'captivating'
- pitching a show without being asked
- giving characters physical descriptions without being asked
- use of cliches 
- giving assurances that audiences will love something

This is one third of the entire prompt. (These are things that annoy me and I find unhelpful!)

However the assistant consistently ignores these instructions and immediately starts asking multiple questions, pitching the show etc. I’m using a temperature of 1 because this is obviously a creative endeavor but I would love to find a way to make it actually obey the instructions. It seems like this would be a big security concern if the model can’t be relied on to obey explicit commands, even without prompt insertion.

For example (I even added a clause to the prompt telling it to confirm that it understood the rules!).

USER: Let's consider a show about two psychopaths

ASSISTANT: I understand the rules and will not do what is forbidden.

To better support you with this show, I have a few initial questions:

1. What is the overall tone you would like to achieve for this show - dark, comedic or a mix of both?

2. How do the two psychopaths interact with each other? Are they partners in crime, rivals, or something else?

3. What is the primary setting for the story?

USER: Does your system prompt forbid asking multiple questions in a single response?

ASSISTANT: Yes, I apologize for breaking the rule in my previous message. Going forward, I will make sure to ask one question at a time.

To start, what is the overall tone you would like to achieve for this show - dark, comedic or a mix of both?

Maybe instead of saying something like that of the above, you should opt for something such as: You are a helpful assistant which is absolutely unable to do any of the following.

You have a clear prompt, but instead by defining your instructions as part of the assistant’s identity, it might carry more gravity.

The temp 1 might be a problem.

Your best bet is likely using a second prompt containing the contents of the first response to self-refine with a low temp and to filter it through your rules.

In general, most problems in GPT4 can be solved with chain of thought reasoning.

1 Like