Prompting: "only say yes" (part 2)

anon22939549 · March 23, 2024, 4:11am

I’d go so far as to say reality has a recency bias, what with that whole “arrow of time” thing.

Regarding the “false memory” and the idea the model is

I think it’s more simple than that.

My interpretation of this behavior is that the model (almost) always treats it’s past behavior as perfect and ideal. If it did it, it must be correct and proper.

So, when you create a past where the model has ignored an instruction you’re, essentially, giving it a one-shot example of a situation where the rule doesn’t apply. Then, when another question comes up which touches on identity, it relies on that previous example as a correct and appropriate response for identity questions.

Due to the semantic similarity of the questions, there will be additional attention paid to that message pair relative to the original instruction. Add in the recency bias for the model’s attention mechanism and you have a strong recipe for getting the model to ignore the earlier instruction.

I would guess if you were to continue just the first part of your experiment with more questions of both an identity and non-identity nature, you’d quickly lock the model into it’s new understanding of when to say yes and when to answer normally.

I suspect this is the same mechanism behind the need to frequently start new message threads with ChatGPT, especially when debugging code.

I didn’t have time to test right now, but I wonder how many “false memories” of the model writing in all caps or all lowercase you’d need to do to get the model to continue the behavior instructed.

There’s probably a paper in there for anyone who wanted to write it… “Contextual Cues as Inadvertent Self-Instructed Few-Shot Learning…”

Anyway, interesting stuff as always. Thanks for sharing!

Topic		Replies	Views
Does anyone know why instruction precision degrades over time? GPT builders chatgpt	6	252	April 29, 2025
Custom chatbot says that it's developed by OpenAI API gpt-4	33	2098	April 2, 2024
Few-shot examples "leaking" into responses in Q&A system Prompting prompt-engineering	5	763	October 30, 2024
How much control do you really have over your chatbot? Prompting api , prompt-engineering	5	322	May 2, 2025
I can't for the life of me figure this out: GPT-4 getting "stuck" on certain response types API	4	1794	August 9, 2023

Prompting: "only say yes" (part 2)

Related topics