Prioritizing info in chat messages/context

Building a sandbox app to learn ChatGPT stuff, and I’ve created a tiny call-log data set that I pass to ChatGPT 3.5 turbo in a “user” context; it is:

1/30/21: Contacted Joseph at Blah Company to inquire about the hose bib I had purchased (I purchased online on 1/10/21). He said he would look and get back to me.
2/5/21: No contact yet from Blah Company.
2/8/21: Blah Company got back to me (didn't get the name of the person). They said the credit-card transaction failed. I asked why they hadn't contacted me, and they said they did, at 970-555-3333 (wrong number - mine is 970-444-8888). I gave them the credit card info again and the transaction was successful. Expected arrival date is 2/15/21.
2/16/21: Hose bib arrived.

I then enter a loop where I can pose questions against the data set - each question is added to the messages list in the user context. In general I seem to get the answers I expect as I ask various questions.

However, if I enter a “question” saying that I actually received the hose bib on 2/17/21, so that the list of messages going the LLM are, effectively:

1/30/21: Contacted Joseph at Blah Company to inquire about the hose bib I had purchased (I purchased online on 1/10/21). He said he would look and get back to me.
2/5/21: No contact yet from Blah Company.
2/8/21: Blah Company got back to me (didn't get the name of the person). They said the credit-card transaction failed. I asked why they hadn't contacted me, and they said they did, at 970-555-3333 (wrong number - mine is 970-444-8888). I gave them the credit card info again and the transaction was successful. Expected arrival date is 2/15/21.
2/16/21: Hose bib arrived.
2/17/21: Actual date of hose bib arrival.

and then I subsequently ask “When did I receive the hose bib”, the LLM continually responds with 2/16/21 - it seems to miss the latest info I’ve added to the conversation.

First, am I off-base in my approach? Secondly - if I’m not - what do I need to do to have the LLM recognize the later delivery date?

Question for you: Why can’t the AI answer?

Maybe if you had included a better second message. “Correction to data error: Hose bib didn’t arrive on 2/16, it came on 2/17”

1 Like

Welcome to the community!

First, an alleged quote:

On two occasions I have been asked, “Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?”

Jokes aside, data reconciliation isn’t always an easy or straightforward thing.

The first thing I would recommend is using an appropriate model. 3.5 isn’t the best when it comes to reasoning. Consider using 4.

Even with 4, you need to use methods to crank up reliability. One is reflection:


USER
Here's a message log:

----

1/30/21: Contacted Joseph at Blah Company to inquire about the hose bib I had purchased (I purchased online on 1/10/21). He said he would look and get back to me.
2/5/21: No contact yet from Blah Company.
2/8/21: Blah Company got back to me (didn't get the name of the person). They said the credit-card transaction failed. I asked why they hadn't contacted me, and they said they did, at 970-555-3333 (wrong number - mine is 970-444-8888). I gave them the credit card info again and the transaction was successful. Expected arrival date is 2/15/21.
2/16/21: Hose bib arrived.
2/17/21: Actual date of hose bib arrival.

----

When did the hose bib arrive?

ASSISTANT
The hose bib arrived on 2/16/21.

USER
Is there any possibility that this answer is wrong?

ASSISTANT
Yes, there is a possibility that this answer is incorrect because the message log states that the "actual date of hose bib arrival" was on 2/17/21. So, while initially it states the hose bib arrived on 2/16/21, it was later corrected to 2/17/21.

but I wouldn’t be satisfied with that. I’d try to actively curate the message log with a RAG type system to raise errors as they’re introduced so they can be cured proactively.

This is important especially for larger prompts. I call it oversaturation. You’ll encounter it in long conversations where you’re trying to workshop something, or in log type situations where the same concept keeps getting updated. You’ll find that the model starts to get increasingly more confused. It’s possible that this is due to the attention based mechanism in LLMs: it can’t pay attention to too many seemingly salient things at the same time, eventually you’re gonna run out of attention (this is conjecture). So, to make reliable reasoning systems, you need to make sure to clean, aggregate, and deduplicate information as much as possible before you send it to a decision stage.

I can give the more precise update a shot and see what happens, for sure. Part of my learning is to see how good the model(s) is/are in sorting out potentially vague information as one would likely receive in a true user-logging-an-event scenario.

Regarding your question “Why can’t the AI answer?” - I don’t know enough about how it works yet to answer that question :slight_smile:

Thanks very much for the welcome to the community and for the thoughtful response. In some ways it’s encouraging to see that the models aren’t “magic” and that someone still has to ensure the fed properly… but, of course, there’s the challenge of existing, muddy datasets…

I’m intrigued by the oversaturation thing… though my dataset isn’t big and I didn’t ask a lot of questions, I did notice an increasing “fogginess” in the answers I was getting as the “conversation” extended… I’ll be keeping my eye on that concept.

I think the current race is to get them to feed themselves. Some think it’s already possible, others think it’s a pipedream. In any case, they’re incredibly useful, but still need a little guidance here and there.

The most important part is that people use them, and get an intuitive feel for their capabilities and limitations so we start using them more productively.

So in that regard, I want to absolutely encourage you to keep working on what you’re trying to do; it can probably be solved.

Good luck!