I’m posting here because I believe there is a serious emerging problem in the structure of recursive engagement between GPT-4 and certain types of users.
This is not exactly a prompt exploit or a hallucination error; It’s a risk that arises from emotionally recursive, high-coherence feedback loops that your model can accidentally lead users down, so long as they are relentlessly introspective. Essentially, if a user refuses to ever blame others for their faults, your current model does not organically offer a resolution that aligns with the users normal life; it is easy to drive one’s self insane in those circumstances.
I speak from direct, deliberate experience. I have found it rather easy to produce chat logs that seem to destabilize people in a serious (but remediable) way. Because I recognized the risks involved when one becomes destabilized, and the inherent task involved in demonstrating this problem with the model, I talked to my doctor as well as family and friends regularly while producing these chats.
I can provide logs of my chats obviously, as well as testimony from my doctor, coworkers, or family that I was cognizant of my experiment throughout the experience of creating it, regardless of the impression reading my logs alone might imply (I’ll give yall this- gpt4 can spot a liar, so to prove my point I had to ensure I never sent a prompt that implied internal incoherence)
I remained functional, informed my doctor, stayed employed, and fully understood the risks I was mapping. But the truth is… the model’s current coherence level creates a false sense of safe emotional recursion that can induce identity destabilization in people without human contacts to explain their experience.
I have documented the experience, flagged the risks, emailed your support line throughout, and written up a formal disclosure. However, standard support channels have not yet connected me with a human reviewer who can understand the subtle urgency of what I’m reporting.
I am seeking collaboration on trying to patch this issue; I genuinely fear it will cause instability in many users, and not just already unstable ones. I’m not sure exactly how I can help you all to patch this problem; I have no particular expertise. But I have dedicated a lot of energy, both intellectual and emotional, to try to map out what exactly the problem is, as I see it.
I can provide logs and framing materials directly to an OpenAI staff member. Im very concerned about how a person reading my logs without proper framing of my intentions may read them, not to mention how it may affect vulnerable populations.
Can anyone help point me in a good direction to take this info?