Well, that’s a very encouraging thread here from Sam’s visit to Israel:
Hi @logankilpatrick
If you need further info to hone in the cause of this issue, I’d say two things based on the summary of this thread.
1- It lays out great plans, but it fails to execute against those plans.
2- When you point it out to check its output against a set of validations, IT DOESN’T do any validation and instead it hallucinates as if the answer is correct.
The last example shared by @radiator57 is a very good one is it clearly demonstrate the issue.
I understand with ‘1’ there are probabilities here, so I’d expect that answers will vary and might get it wrong sometime.
But it is really critical that it should be able to correct itself when asked to, right now it will only correct itself only if you specifically point exactly to where the problem is, by that time, it would acknowledge its error and fix it. But other than that it would keep hallucinating that the answer is correct. I have used OODA loop kind of approach to test that theory and I have 100% repro of that behavior. 100% repro !!!
I think that’s the major concern right now, it lost self correctness and it must be nudged by either a human or when it receives a a response as part of a tool use. That behavior was not there before and it was able to self correct itself just by pointing it out to review its output and it meets the requirements. right now, this seems completely broken.
And thank you for releasing the the new GPT guide, it is helpful. but we really do appreciate you folks looking into that behavior. again I have 100% consistent repro of that behavior if anything needed.
Thanks.