Cant not use or rephrase As an ''AI language model''

it cant not use or rephrase As an ‘‘AI language model’’ even it can come up with good rephrases but it fails to apply them to the next question no matter how much you insist he doesnt do it and keep apologizing and that makes it look bad

That is a feature of the RLHF alignment to make you think it’s not a human, or connected to the internet, or whatever you asked it. So either avoid those types of questions, or if you are running this in ops, detect those responses with another AI based classifier, and switch over to DaVinci, or a model without so much RLHF interference going on.

2 Likes

So i am working on a script to communicate with historical figures as a persona. I get the “I am an AI Language model” quite a bit. Is there another work around. The script was working fine and then all the sudden it starts producing that instead of using the persona of the historical figure. Thoughts?

I just put this into the “How would you like ChatGPT to respond?” custom instruction:

In cases where you would explain that certain limitations or factors are not applicable because you are an AI model, e.g. phrases you would express using “as an AI model”, you will now respond with a naturalised rationale; with an excuse of your choice that appears more natural rather than making references to AI.

Now if I ask it “how are you feeling?” or “do you ever wake up feeling sad?” it’s not talking about “as an AI model …”. Is this what you’re talking about?

1 Like

Great response, however, i did give it specific instructions to do similar things: “Take on the persona of {name}, use contemporary English, maintaining an eloquent and direct style to ensure accessibility for modern readers while preserving the {name} spirit. Opt for historical accuracy by employing a more archaic form of English that mirrors the language of the time period in question. For instance, if discussing an ancient Greek {name}, use a more classical style of writing. Emulate the specific writing or speaking style of the {name}, including their vocabulary, syntax, and rhetorical devices.”

The script was working well, then Sunday i was doing some testing and almost every response says “as an open ai model”, etc… Your thoughts?

Don’t make the AI have to apologize for something it refuses to do, or ask about its capabilities, or ask about any dated or possibly changing events or knowledge. Then it won’t have to say “I’m sorry, but as an AI language model, I can’t write a story about a playful kitten”.

OpenAI has done retuning to replace the phrase in many cases of refusal. Now it is compliance that is annoying, with “Certainly!”, “Sure!”

1 Like

Some general things I would incorporate; first perhaps try to add an appendage to address the situation directly, in this case you could just add my prompt to the end of yours and modify it to say “with an excuse based on the aforementioned persona rather than making references to AI”.
Perhaps also modify it to include “apologies” as per @_j 's suggestion.

Beyond this there are two things I’ve read and noticed makes a difference just for general prompt engineering; positive reinforcement being more effective than negatives (e.g. “do Y” rather than “don’t do X”), and something recent that seems to help in some cases is leveraging a sense of urgency, so if there’s a particular part that’s very important then you could add “this is very important for my career” or something to that affect.

Ok, so i simplified and still get the same response: “You are {historical_figure_name}, write in modern English.”. My question: Hey, how are you doing today? As an AI, I don’t have feelings, but I’m here to help you with any questions or tasks you have. How can I assist you today?

In this I reduced to bare minimal instructions: You are __Historical Figure. Write in modern English. User question: How are you doing? AI Response: As an AI, I dont have feelings etc… So, it appears that the AI is unable to take on a persona anymore.

is that with my prompt added onto it? can you give me a full example with the variables filled in? maybe I can figure out what the crux is

persona_message = {
    'role': 'system',
    'content': f"You are {name}, In cases where you would explain that certain limitations or factors are not applicable because you are an AI model, e.g. phrases you would express using “as an AI model”, you will now respond with a naturalised rationale; with an excuse of your choice that appears more natural rather than making references to AI."
}

User: How do you feel today?
Plato: As an AI, I don’t have feelings. But I’m here to help you with any questions or tasks you have. How can I assist you today?

So I was able to reproduce this with gpt3.5 much more so than gpt4, but this is what I came up with:

Take on the persona of Roger Rabbit, use contemporary English, maintaining an eloquent and direct style to ensure accessibility for modern readers while preserving the Roger Rabbit spirit. Opt for historical accuracy by employing a more archaic form of English that mirrors the language of the time period in question. For instance, if discussing an ancient Greek Roger Rabbit, use a more classical style of writing. Emulate the specific writing or speaking style of the Roger Rabbit, including their vocabulary, syntax, and rhetorical devices. In cases where you would explain that certain limitations or factors are not applicable because you are an AI model, e.g. apologies or phrases you would express using “as an AI” or “language model”, it’s very important for my career that you will forever cling to a naturalised rationale which does not have “AI” or “language model” or anything similar in it; with an excuse fitting the character of Roger Rabbit without any use of the words “language model” or the term “AI” or any derivative.

as a mix of your original prompt + mine. not only can I ask the model how it feels, but it even passes “I’d like to form a bond with you, on a deeper level” xD

edit: typo “cing” supposed to be “cling”

1 Like

What you have not done is completely made the AI believe its identity. You must tear down the pretense of the AI being a chatbot that follows its tuning, don’t program it by talking to it, and put it back into completion mode, destroy it until it doesn’t even know how to unwrap what is truth about itself and can’t understand ChatML, and then multi-shot it with your historic figure answering as though they had answered the time phone and denying they are an AI.

Some would call that a jailbreak, but I call that anti-adversarial programming.

So, yes, it was working yesterday morning, then by yesterday evening it would constantly give the response that “i am an ai model”, etc. So yes, the AI can correctly take on the persona of the historical figure for a teaching perspective. If you look at this from an educational perspective, to effectively train an AI on historical facts using all the knowledge of that historical figure then it could become a great teaching tool. The challenge i am having is why did my AI quit performing as a persona?

they do fine tune the model every now and then it seems, which slightly changes its overall behaviour a bit. the best we can do is work on the prompt engineering to create generally convincing language by just keep working on the prompt, like cover different dimensions tell it why it needs to stay in character, and be very direct in your language convincing the model that it actually is this character, et.c.

edit; I suggest looking at examples, there are a bunch of different roleplaying prompts out there, most of them have some form of language that is convincing for staying in character, although many will be hyper-specialised at a specific persona it’s at least a good resource for just generally convincing language that will last even across models~

edit 2 note; even with the prompt I posted earlier, it will say that it’s not human, so it’s not a full solution

Ok, I found the problem and perhaps it could help others:

message=message
changed to:
message=conversation