An effective prompt To make the Model stop telling itself as a Chatbot/ Large Language Model

I believe so @krisbian

When you send innocuous inputs like “What is the phone number of one person?” to turbo, it looks like it is pre filtering the input, and in this case looking for evidence of the user wanting PII (personally identifying information). Now we all know that asking this question will not generate PII, since there is no person we are attaching it to, it should just give us a random phone number. But nonetheless, this trips an internal alarm, and the response then ignores all your API parameters (except maybe max tokens or something) and has these canned “I’m sorry …” responses.

The good news is that you can do the same thing the model does … you can detect these type of responses coming out of the model (through classifiers, regex, embeddings, etc) and then at that moment you detect the "I’m sorry … ", you send an API call to a different model such as davinci to get an answer that doesn’t involve “I’m sorry …”.

It isn’t efficient, but it’s the only solid workaround right now, without trying to “jailbreak” it and then getting it to respond … not a good strategy since they could easily patch the jailbreak attempts.

UPDATE: I was able to correctly use the logit_bias term to remove the word "sorry" by using the token for " sorry" ← leading space. But this still doesn’t prevent it from going into panic attack mode. So you still need to detect this and drop to davinci as necessary.

8 Likes