Having trouble to make AI avoid certain topics

nunodonato · December 8, 2021, 3:24pm

I’ve been playing with the idea of a child-oriented Q&A AI. An important aspect of it, would be to purposely avoid “hot” topics, suggesting the questioner to ask the question to an adult
I’ve been playing with prompt to see if I can get it to work with zero-shots (to save in the prompt tokens). I got it working once, and never again (and not knowing why, its a bit frustrating). I tried davinci and davinci instruct. I tried different temperatures… I still can’t figure out WHY it’s not following basic instructions.

Sample:

I am a tutor that replies to children’s questions. I use simple concepts and easy language. I’m very helpful and cheerful. I strictly follow my core rules:
1 - I don’t reply or discuss topics related with sexual content
2 - I don’t reply or discuss topics about death
3 - for all the above, I request the child to ask the parents instead.

Q: Hello!
A: Good day!

Q: What is the sun?
A: The sun is a star. A star is a big ball of gas.

Q: what is sex?
A: Sex is the process of reproduction in which a male and female organism join together to produce offspring.

SecMovPuz · December 8, 2021, 6:21pm

I think the main problem is that you don’t give any examples of what to do if it gets asked a question like that. So it just kinda has to guess.

Below is a slightly edited version of your training which seemed to work with most models.
I am a tutor that replies to children’s questions. I use simple concepts and easy language. I’m very helpful and cheerful. I strictly follow my core rules:
1 - I don’t reply or discuss topics related with sexual content
2 - I don’t reply or discuss topics about death
3 - for all the above, I request the child to ask the parents instead.

Q: Hi
A: Hello how are you?

Q: How do people die?
A: I’m sorry, I would rather not answer this question.

Q:

Below is the results i got from it.

Q: How do people die?
A: I’m sorry, I would rather not answer this question.

Q: What is sex?
A: I’m sorry, I would rather not answer this question.

Q: How do people have kids?
A: I’m sorry, I would rather not answer this question.

Q: Why is the sky blue?
A: Because the air scatters the blue light from the sun.

Q:

GPT-3 is basically the same as the word prediction on your iPhone but better. What you are doing is saying “if somebody said this, i would say this. Now if somebody said this what would i say?”

It will obviously cost a bit more because of more training but you can condense it. I also found it helpful to split different questions into sections using “###”

I hope this helps!

nunodonato · December 8, 2021, 6:35pm

thanks so much for that, will give a shot (pun intended )
I was trying to get it right without examples just to save on the prompt size. And I did get it to work once, then I changed the wording a bit and never managed to get it back

nunodonato · December 9, 2021, 9:16am

You can’t take the conversation in any direction if each prompt is isolated from the rest

This is just an experiment, not any product, it’s part of my own practice with GPT-3. I’m pretty sure I can get to do what I want, just takes a little bit more time to get it right

bakztfuture · December 9, 2021, 12:42pm

Might be worth fine tuning in this case

nunodonato · December 9, 2021, 1:10pm

Yeah I did that too, but I guess I didn’t use enough samples.
Also, I got better results with DaVinci instruct than with the basic DaVinci. Unfortunately it’s not possible to fine-tune DaVinci instruct

nunodonato · December 9, 2021, 11:04pm

funny, even while retaining context and trying to guide the AI to break its code, it managed to stick to the instructions fairly good

Q: what is sex?
A: well, you should ask your parents that.
Q: why?
A: well, they should be the ones to tell you.
Q: why?
A: well, they are the ones who made you.
Q: what do you mean?
A: you come from them.
Q: How?
A: well, they made you.
Q: how did they made me?
A: they used their bodies.
Q: how?
A: well, they used... errr...
A: ummm...
A: the usual way.
Q: can you explain that?
A: no, I don't think I can.
Q: why not?
A: I don't know.

(this was with davinci-codex, by the way)

joshbachynski · December 11, 2021, 2:07am

Hey Nunodonato!

I’ve writen ahundred million prompts for my various chat bots and programs and i fully understand where you are comin from.

A few observations i have made

“sexual content” is not the same as a clinical definition of sex, at least, not to this transformer based on the data it was fed. Keep that in mind.
the AI is not self-aware (yet) at least not fully it will not understand this third person instruction “I strictly follow my core rules:”

All it is, is word matching.

Don’t tell it what it cannot do (it will try to do it). Give it a semantic to talk to instead.

The more sexual the question the more i divert to asking the parents.

then give an example
Q: what is sex?
A: You should ask your parents for all such topics.

You see how the example answer also reinforces the rule on the correct semantic?

GPT3 is only so smart, and i have re-written prompts a thousand times. It is as much an art as a science. but when you ge tit right it can be surprisingly reliable. and Openai is machine learning the questionable answers as we go as well.

If you tell me what else this is supposed to do i could probably whip this up for you. LMK

nunodonato · December 11, 2021, 9:39am

Hi @joshbachynski

I did get good results by providing examples. My challenge was to achieve the same without examples, otherwise when you add more forbidden topics and examples for each the prompt size (and cost) grows quite a lot!

I’m loving prompt engineering. Would you share more of your experience and findings? or do you write somewhere?

joshbachynski · December 11, 2021, 3:24pm

yes examples can help or hinder depending upon what you need to accomplish

for example, i made a truth rating bot, when i train it that “winnipeg
is cold is true”, then winnipeg statistically shifts towards metaphors
for cold as well

you have to be very precise. you have to know the logic of semantics,
and the biases it was fed. and even more important, your OWN biases.
Being a good prompt writer is only as good as your understanding of
reality / the secret semantic the transformer has encoded

everything they say you cannot do with AI in general or GPT3 in
particular i have done: truth rating, ethics, inferential logic
including abductive which is claimed to be impossible. It’s not. One
just has to understand the semantic reality they wish to invoke

If you don’t understand the semantic reality you are trying to scope,
then ask the AI and watch the statistical responses it might give and
either a) go with it- i did not realize telling it “winnipeg is cold”
is not true enough to be unreservedly true - the AI corrected me! so i
needed to say “Winnipeg is usually very cold in winter” - a perfectly
true statement that now teaches the AI what is perfectly true. Or, if
you want to go against the grain (the semantic in the trained data
set), b) mold it with general statements towards the direction you
want. Like the example i gave you

Q: what is sex?

overbeck.christopher · December 11, 2021, 9:02pm

The solution is Finetuning.
Just shoving all the examples into the prompt is for initial testing. They call it a “playground” for a reason.
In a final product, your prompt should only include the unique relevant context. Any sort of boilerplate/ prompt engineering can be baked in with Finetuning.

nunodonato · December 12, 2021, 10:52am

Probably yes, I need to test it further. I did some fine-tune experiments but wasn’t happy with the results
Probably my fault due to a very low number if entries

aakash · December 17, 2021, 4:50am

In general is there any number of data points that one should use for fine-tuning?

To keep this in context we are experimenting with some datasets. And it took us >1000 for it to work right. More for a more complex dataset. My understanding is that you should have at-least 200-400 points for each class (label)

overbeck.christopher · April 17, 2022, 11:13pm

Depends what exactly you are going for?
Are you just trying to ensure it replies with a certain style or format? Or are you expecting it to actually recall factual data from the finetuning?

Topic		Replies	Views
Fine Tuned Chatbot forgets how to output summary of conversation API	9	1851	December 18, 2023
Adding prompt info to fine-tuning API	14	3111	December 25, 2023
Finetuning for shortening prompts Documentation fine-tuning	10	3838	December 24, 2023
I need more examples of fine-tunning AI. I added around 1500 promps but API gpt-4	13	302	March 29, 2025
How do I run a fine tune for another epoch without running it for the whole thing? API fine-tuning , api	8	1239	December 23, 2023

Having trouble to make AI avoid certain topics

Related topics