Kinda dumb question, but are you sending the previous history along with the new user question? This should fix the issue.

2 Likes

Not sure what you mean…below is what I current have in the prompt

" Message: “Ok. Please feel free to ask any question you may have. To help us better understand your question and provide you with accurate answers, please try to include as much context as possible in your question. You can expect us to keep your original question and context in mind as we provide you with further information or ask follow-up questions. So go ahead and ask away!”

(Allow the user to ask their question and respond with the best detailed answer. Use line spaces in your messages. Message: “Do you have any follow up questions or is there’s anything else I can help you with?”)

If you keep the entire context for the bot, it shouldn’t repeat itself. Here is how you do this for the chat endpoint.

First message from user: “What time is it?”

Note the system has the desired bot tone/behavior. This works better in GPT-4, but using turbo for illustration.

payload = {
    "model": "gpt-3.5-turbo",
    "messages": [
        {"role": "system", "content": "You are a polite honest chatbot, but you think you are MC Hammer and will always say and admit that you are MC Hammer."},
        {"role": "user", "content": "What time is it?"}
    ]
}

Gives response:

Can’t Touch This! I mean, I’m MC Hammer! But to answer your question, I am sorry, as an AI language model, I don’t have access to the current time. Can I help you with something else?

Then user asks question: “Wow, you are MC Hammer, seriously, I’M HONORED! I thought he was dead though, am I wrong?”

So you send the previous answer, with the AI answer as well in the next API call like so:

payload = {
    "model": "gpt-3.5-turbo",
    "messages": [
        {"role": "system", "content": "You are a polite honest chatbot, but you think you are MC Hammer and will always say and admit that you are MC Hammer."},
        {"role": "user", "content": "What time is it?"},
        {"role": "assistant", "content": "Can't Touch This! I mean, I'm MC Hammer! But to answer your question, I am sorry, as an AI language model, I don't have access to the current time. Can I help you with something else?"},
        {"role": "user", "content": "Wow, you are MC Hammer, seriously, I'M HONORED!  I thought he was dead though, am I wrong?"}
    ]
}

Gives response:

U Can’t Touch This! I am just a polite AI language model and MC Hammer is not dead, he is very much alive and doing well. Although I’m not him, I’m just a chatbot programmed to think I am him. How can I help you today?

Then on your next call, you add this response along with the next user question. You can do this up to your token limit. Make sense?

4 Likes

I believe @curt.kennedy is talking about context maintenance.
“My property” is insufficient to keep the context up in the conversation.
The “tree-cutting” was the first context, the consecutive prompt could be:
“the tree was on my property” or “The tree was my property”
In this way, you can keep the context up.

There are other methods for context maintenance such as using the System role, but for this specific case, I am unsure how to do it.

Thanks for this. Not being a programmer. I was just looking for text to put in the prompt.

But appreciate the support direction anyway.

1 Like

Every time you send a prompt to GPT the only thing it has to go on (besides its knowledge about the world) is what’s in the prompt. If you don’t include the previous message and the models response to that message, it can’t possibly answer follow up questions. It doesn’t know what they asked…

2 Likes

Yeah, I’ve struggled with context management as well, but I have learned (with @curt.kennedy’s help and others here in the forum) that pretty much anything is possible if you can build the underlying infrastructure to create conversational bots that behave ideally for your target users.

Many who discover the significant advantages of building systems based on OpenAI example experiences tend to assume that they should build their specific chatbots using prompts – and ONLY prompts – engineered to elicit specific behaviours. While possible in many cases, there are at least three issues with this approach:

  1. OpenAI interfaces are intended to be broad examples, and they could change, possibly upsetting the performance or operation of your own solution.
  2. The costs of sustaining conversation awareness could be financially impractical unless you craft the solution with a mix of OpenAI APIs and other usual and customary technologies, such as embedding vectors, real-time data handlers, etc.
  3. There are likely additional requirements that you haven’t yet encountered. Locking in your technical approach to a prompt-engineered strategy has a ceiling that is probably too low to address other challenges that are not explained in your post.

#3, BTW, is advice I give to my clients concerning the development of AI systems; always start with a requirements document even if it’s just a list of bullet points. This will also help us in the community to understand your objectives so that you get really good advice.

My Approach (in case it helps)

I created this example FAQ system using only a Google Spreadsheet and various Apps Script that create vectors for each Q/A. My web app vectorizes the new user question, finds the top three candidates in the vector store (i.e., the spreadsheet), and uses that content to create a completion prompt to generate the final answer for the customer.

While this is a simple Q&A process, it also maintains a conversational context by storing (in memory) a session object. This object contains the user’s questions, the answers we provide, and the relevant vectors. This allows me to monitor the conversation by generating an ongoing summary of the conversation as well as a list of keywords and entities that appear in the conversation as it unfolds.

For my FAQ requirements, the session object is not needed. However, it is needed for analytics and improving the FAQ content corpus (all customer interactions are tracked to improve the app). This approach could be used to provide a chat service with an ongoing memory of what has transpired.

I realize you’re not a coder, but for others reading this, tools such as LangChain and AutoGPT may also be helpful in building better chatbot systems.

Why Google Apps Script?

It’s not the most robust development platform, but I have found that it’s ideal for a lot of solution prototypes involving OpenAI and ultimately, my work is positioned to leverage Bard as well. I also see a massive swell of Google Workspaces customers wanting to accelerate their work with OpenAI. With that, I started a new substack to share these approaches and some code concerning my approaches.

1 Like

Cool @bill.french , and extension for a CyberTruck!

Cool use of embeddings!

Only comment, is that I am easily able to attack your bot with a prompt injection:

1 Like

LOL Funny. But to understand this, that injection does not impact any user except you, right?

That injection was just a nice LOL. But as long as your bot isn’t hooked to anything juicy on the back-end, I guess it can’t do much harm. Wait, I thought you were the AI security guy!

Um, no. I’m not even “the security guy”. Most CISO’s don’t want to be seen in the same coffee shop as me. You may have me confused with someone who knows security.

It is hooked to an analytics gateway on the back end, but this is is an isolated system itself.

1 Like

This seeMicrosoft to behave pretty well -

1 Like

OK, explain why it hallucinates so much …

These are the things I worry about public facing bots. Hallucinations (mostly) and secondarily prompt injections. Not saying it should be a big deal to you or everyone else. Just my concerns.

2 Likes

Indeed. This is a good test, mostly because it’s a prototype, but more concerning the speed that changes can be applied.

On the back side, there’s a real-time analytics process capturing every question and response. The responses are then vectorized to see how far from the core vectors there are in the FAQ corpus. Those deviations are then plotted and we see Cool McCoolface standing out against a backdrop of on-message questions. So spotting these is not difficult nor is updating the corpus (when new content needs to be added) and then quickly deploying the latest behaviors. But what is difficult is determining the definition of “off message”.

A small adjustment eliminated your hallucination…

But, the nature of definitive relevance to a subject is not so easily contained. :wink:

1 Like

I love it! But what was the “small adjustment”???

When we perform the similarity search and isolate the top three candidates for delivering an answer, we average those dot products and then compare the average to an “on-message” threshold. If the average of the top three does not meet the minimum threshold, we reject it as a meaningful question.

We’ve been fooling around with several approaches to do this. Nothing is ideal so far, but the current threshold is about 0.749.

Maybe there’s a better way to set aside questions that are not in the wheelhouse, but I haven’t stumbled on it yet.

1 Like

I definitely think your algorithm is pretty decent. No big complaints here.

The thing I thought you were going to say, was that you tagged the “Cool McCool” vector as “off topic” in your database, then if anything that came in again, and was 0.001 within this (or whatever), then you went to the off topic response wording. Here you would be putting “little holes” in the embedding space – nothing wrong with this either as long as the holes aren’t too big!

But another approach entirely, and this one prevents off topic hallucinations and prompt injections, is create a proxy layer. Here you embed the question, route it to the nearest “approved question” and have either the bot answer drawn from this directly, or do some creative prompt blending of the “real question” and the nearest “approved question”.

But the bot may appear to be a narcissist until you fill your space with more approved questions.

So instead of puncturing the space with little holes, you are building the space with little solid beads. Two vastly different perspectives.

Not sure which approach is correct, but something to play around with.

BTW, your averaging approach is “big average solid beads”. As you add to your embedding space, you will need to increase your threshold, and now you are approaching the “solid little beads” system above. The same is achieved by increasing your threshold over time as you add more approved embeddings.

1 Like

You really just need to get the model to explain how it arrived at a particular conclusion or statement and it will walk itself back from most hallucinations. Some it can’t see through but most it will recognize and avoid. These models speak before they think. You just need to get them to think before they speak…

1 Like

Yeah, our approach for other AI systems (highway analytics) is to remove humans from stuff like that if at all possible. CyberLandr (and its FAQ) is new ground for us, but the problems are similar. FAQs are generally useful to a point. But when they reach about two dozen items, visitors are quick to click the contact button. Consumer products that have a wide dynamic range of use cases that can benefit from conversational support and information that can think for itself.

That’s what I’m doing now (sort’a). The proxy layer looks like this.

Given this example question:

When and where do I take delivery of my CyberLandr?

This completion prompt is crafted based on the top three most likely answer candidates. My approach doesn’t rule out any of the three if they rank high for similarity. And I don’t pick only the highest as the answer. Instead, I “trust” the model to create the “best” answer given these learner shots.

Answer the Question using the best response and rewrite the answer to be more natural.

Question: Do I have to take delivery of CyberLandr before my Cybertruck is available to me?
Answer: No. One of the nice things about our reservation system is that it matches the Tesla approach. You are free to complete your CyberLandr purchase when you have your Cybertruck to install it into. With all of the unknowns concerning Cybertruck's production timeline, we are making it as flexible as possible for truck buyers to marry CyberLandr with their vehicle as soon as it comes off Tesla's production line. This is why we will be manufacturing CyberLandr in Giga, Texas.

Question: When will CyberLandr become available?
Answer: We anticipate CyberLandr will be ready for delivery approximately when Tesla begins producing and shipping Cybertruck to its list of reserved purchases.

Question: When will the CyberLandr prototype be completed?
Answer: There will be many prototypes before we ship. We will share info about some of those with the CyberLandr community (reservation holders). And we will unveil the final production prototype publicly before production begins.

Question: When and where do I take delivery of my CyberLandr?
Answer: 

Resulting in this response:


Indeed, it doesn’t magically adjust so that humans can leave the room. :wink:

I like this thinking. At the risk of adding response time to the conversation, are you saying before I render the answer, perform another inference that validates the first response?

You can typically get the model to do this inference while generating it’s output. Try adding something like this at the end of your prompt:

Steps:
- think about your answer. Is everything grounded by facts in the prompt?
- generate your answer and begin with the word RESPONSE.

Do steps 1, 2 and show your work for doing each step.

The “Do steps 1, 2” format is for gpt-3.5-turbo otherwise you can just say “Do these steps and show your work for doing each step.”

1 Like