The Waluigi Effect (So much for allignment?)

I thought this was an entertaining read. So much for RLHF … j/k. Am I? Good comments at the bottom too.

This is mostly relevant to ChatGPT, but entertaining nonetheless. Personally, I think of the whole completion as a random walk, so I am trying to deepen/expand my knowledge here by viewing others :roll_eyes: alternative perspectives, which this certainly provides!


I thought the same… One of the most enlightening posts I’ve read on on the psychosis of ChatGPT

