I love it! But what was the “small adjustment”???
When we perform the similarity search and isolate the top three candidates for delivering an answer, we average those dot products and then compare the average to an “on-message” threshold. If the average of the top three does not meet the minimum threshold, we reject it as a meaningful question.
We’ve been fooling around with several approaches to do this. Nothing is ideal so far, but the current threshold is about 0.749.
Maybe there’s a better way to set aside questions that are not in the wheelhouse, but I haven’t stumbled on it yet.
I definitely think your algorithm is pretty decent. No big complaints here.
The thing I thought you were going to say, was that you tagged the “Cool McCool” vector as “off topic” in your database, then if anything that came in again, and was 0.001 within this (or whatever), then you went to the off topic response wording. Here you would be putting “little holes” in the embedding space – nothing wrong with this either as long as the holes aren’t too big!
But another approach entirely, and this one prevents off topic hallucinations and prompt injections, is create a proxy layer. Here you embed the question, route it to the nearest “approved question” and have either the bot answer drawn from this directly, or do some creative prompt blending of the “real question” and the nearest “approved question”.
But the bot may appear to be a narcissist until you fill your space with more approved questions.
So instead of puncturing the space with little holes, you are building the space with little solid beads. Two vastly different perspectives.
Not sure which approach is correct, but something to play around with.
BTW, your averaging approach is “big average solid beads”. As you add to your embedding space, you will need to increase your threshold, and now you are approaching the “solid little beads” system above. The same is achieved by increasing your threshold over time as you add more approved embeddings.
You really just need to get the model to explain how it arrived at a particular conclusion or statement and it will walk itself back from most hallucinations. Some it can’t see through but most it will recognize and avoid. These models speak before they think. You just need to get them to think before they speak…
Yeah, our approach for other AI systems (highway analytics) is to remove humans from stuff like that if at all possible. CyberLandr (and its FAQ) is new ground for us, but the problems are similar. FAQs are generally useful to a point. But when they reach about two dozen items, visitors are quick to click the contact button. Consumer products that have a wide dynamic range of use cases that can benefit from conversational support and information that can think for itself.
That’s what I’m doing now (sort’a). The proxy layer looks like this.
Given this example question:
When and where do I take delivery of my CyberLandr?
This completion prompt is crafted based on the top three most likely answer candidates. My approach doesn’t rule out any of the three if they rank high for similarity. And I don’t pick only the highest as the answer. Instead, I “trust” the model to create the “best” answer given these learner shots.
Answer the Question using the best response and rewrite the answer to be more natural. Question: Do I have to take delivery of CyberLandr before my Cybertruck is available to me? Answer: No. One of the nice things about our reservation system is that it matches the Tesla approach. You are free to complete your CyberLandr purchase when you have your Cybertruck to install it into. With all of the unknowns concerning Cybertruck's production timeline, we are making it as flexible as possible for truck buyers to marry CyberLandr with their vehicle as soon as it comes off Tesla's production line. This is why we will be manufacturing CyberLandr in Giga, Texas. Question: When will CyberLandr become available? Answer: We anticipate CyberLandr will be ready for delivery approximately when Tesla begins producing and shipping Cybertruck to its list of reserved purchases. Question: When will the CyberLandr prototype be completed? Answer: There will be many prototypes before we ship. We will share info about some of those with the CyberLandr community (reservation holders). And we will unveil the final production prototype publicly before production begins. Question: When and where do I take delivery of my CyberLandr? Answer:
Indeed, it doesn’t magically adjust so that humans can leave the room.
I like this thinking. At the risk of adding response time to the conversation, are you saying before I render the answer, perform another inference that validates the first response?
You can typically get the model to do this inference while generating it’s output. Try adding something like this at the end of your prompt:
Steps: - think about your answer. Is everything grounded by facts in the prompt? - generate your answer and begin with the word RESPONSE. Do steps 1, 2 and show your work for doing each step.
The “Do steps 1, 2” format is for gpt-3.5-turbo otherwise you can just say “Do these steps and show your work for doing each step.”
Much of my time in machine learning is spent manually creating ways for the AI to respond correctly to new input data. There is still a human in the loop, especially in the early days before things stabilize. And here I am talking several thousand things are updated by me manually.
I like your bot and approach so far. And yeah what @stevenic says about the bot doing an internal reflection and logical breakdown and justification is very powerful. But don’t be shy about getting yours hands dirty either! You need a lot of things for it to work.
Right now I’m creating my own AGI Agent to write research papers for me given a few high level inputs. This is different dirty work because the AI generates its own content and embeddings. So I have to rely more on the AI to get things right, but control it through logical steps and lots of embeddings. But limited by the 8k context window of GPT-4.
In early tests, this approach is showing promise. Thanks for @curt.kennedy and you for steering me in a smart direction!