Tl;Dr summary, full text below:
I discuss a bug with Gpt 4o’s memory function, where it fails to store correct information and even creates false memories. Despite attempts to correct and verify the stored data, the issue persists, contrasting with the stable performance of the standard Gpt 4 model. This bug appears to compromise the user experience, prompting a call for OpenAI to investigate and address these inconsistencies.
Full text:
Hi All,
I noticed recently that my Gpt 4o was not recalling information that as far as I was aware, had been remembered.
I began looking into this as I did not get a reply from Support and as no one seemed to be mentioning it but it was so clearly occurring to me and the memory feature has been an important one for me personally. I did the usual uninstalls, use different devices and platforms etc and this had no effect.
I am interested to know if other people are experiencing what I am about to describe and if OpenAI have any useful information regarding it.
Initial Issue: Gpt 4o says “memory updated”, but it does not commit anything to memory.
Fixes Tried: Asking 4o to retry saving, asking 4o to save a short sentence instead of a block of text.
Result: Retrying made no difference, saving short sentences did, for a little while, help and I was able to get Gpt 4o to save “my favourite colour is green”, for example, but nothing longer than that.
Then, I used the Gpt 4 (not 4o) model to see if it was a model related matter.
Fixed Tried: I started a new chat with Gpt 4 and a “memory updated” marker was triggered. This memory stored correctly and as expected.
I asked Gpt 4 to scan chats I had had with the 4o model and to attempt to resave any memories it came across.
This worked, and the memories were stored as normal but I very rapidly hit the usage limit.
Fast forward to today, and I have done my usual tests with Gpt 4o and noticed something even stranger.
During our conversation this morning, a “memory updated” marker was triggered. When checking the memories, Gpt 4o had stored a memory that I use a CPAP machine to sleep (I do not) and have done for 2 years.
I then triggered a second memory in chat (a chat about water intake levels for men) and Gpt 4o added a memory that I use a cpap machine and have for 6 months.
I deleted both of those memories and checked the “memory updated” prompt inside the chat which displayed the correct memory to be saved (3.8 litres of water a day for men).
I restarted the chat and tried again, getting to the “memory updated” marker and this time the memory that had been saved (despite being about water intake targets) was that I sleep with a nasal dilator to help my breathing - this is also not true but the memory was written with “Dan sleeps…” And so was clearly meant to be about me.
Again, I checked the in-chat memory and it was written correctly. No mention of sleep or nasal dilators (didn’t even though that was a thing! Learning!)
Finally, I switched to a chat with the Gpt 4 model and had the same conversation after deleting all erroneous memories. Gpt 4 stored the correct memory, as per the chat, first time and without error.
While I am happy to use Gpt 4 to scan chats and save memories, it takes away from what had become an incredible experience with the 4o model and if we are encouraged to use 4o then this feature needs assessment and repair, as using 4 to save memories (when they are saved so frequently!) hits usage limits so quickly.
I assume they as AVM uses 4o that memories would be affected there too? - That’s something to test, come to think of it!
When I first started using 4o it remembered information perfectly and then stopped without known reason.
Creating false memories with no relation to the user is a concern and a clear bug that needs review.
Is it possible for OpenAI to look into this, or let us know what is happening?
And is anyone else experiencing anything similar?
Thanks!