Embeddings not preventing OpenAI from answering

chinmay.duke · June 22, 2022, 11:45pm

I used the embedding based chatbot in playground.

I gave a simple context

AssignVisitAccountAndCoverage-This API assigns the visit account and coverage for a provided and it returns the copay that will be due.
CompleteRequest-This web service marks a room cleaning request as complete.

Using this simple context, when I asked:
Q: What is Node.js?
A: Node.js is a JavaScript runtime built on Chrome’s V8 JavaScript engine.

Q: Where is Delhi located?
A: Delhi is the capital of India

As you can see that my context does not have this info and the bot should have said “Unknown”. But it did not. Any thoughts on how to fix this?

chinmay.duke · June 23, 2022, 5:25pm

I have one more related question on Embeddings. So every time a user asks the bot a question, the bot searches the knoledgebase (KB) and answers. This makes the following use case impossible:

USER: Is there an airline that flies directly from SFO to Austin?
BOT: Yes.
USER: Which one?
BOT: Unknown.

How do we maintain the context, given that my KB is huge? I want to train once on KB, create embeddings and then answer questions based on previous questions.

daveshapautomator · June 23, 2022, 5:39pm

You need to break this into multiple steps. First step is to ask if the answer to the query is present (True/False) and only if true, then you proceed to extract the answer.

This method is demonstrated in my reduce confabulation video

daveshapautomator · June 23, 2022, 5:41pm

You will need a cognitive architecture to achieve that. It requires integrating several steps, such as asking the right questions, performing search, and integrating it into a corpus. My book Natural Language Cognitive Architecture outlines how to do this.

chinmay.duke · June 23, 2022, 5:45pm

I saw that @daveshapautomator earlier when we exchanged posts on Finetuning. In this case we are not using Finetuning but embeddings (per your suggestion). Our embeddings come from 5000 documents. So we create embedding once and reuse that (as opposed to creating embedding with each query).

Our prompt specifically says that if a questions can’t be answered then say “Unknown”. It works many times times. But it fails spectacularly as shown in my original example.

Are you saying that we despite of using embeddings (and Unknown prompt) we do the following:
Q1: Ask if the KB has the answer (Yes/N0)
Q2: If answer is Yes then complete, else say Unknown.

If that is what you are recommending, how do we ask the bot to check the embeddings if it has the answer (Q1).

daveshapautomator · June 23, 2022, 6:57pm

This is a hard problem and I have not tackled a task quite this big. I was working on something far larger by using Wikipedia a source of ground truth but have not worked on it for a while. So keep in mind that my next ideas are only hypothetical, but now that I know a bit more about the problem you’re working on I think my recommendations may be more accurate:

Option 1: Search Index

Before I realized that GPT-3 has a lot of general knowledge, I was working on using Wikipedia as a source of truth to be incorporated into my cognitive architecture. For this problem, I had to find a way to make 5 million articles rapidly searchable so I settled on SOLR as an offline search index. Since you are dealing with several orders of magnitude less, this solution should work rather well for you. Here’s the video about that:

Essentially, you break the problem down into several steps, as outlined in my book (which I recommend you read if you haven’t):

When a query comes in, you first use GPT-3 to generate appropriate search times to fetch the correct information. This can be done with a prompt like “Extract search terms to Google the correct information for the following query” or something like that. GPT-3 is really great at writing Google queries.
Use said search terms to search your SOLR instance for the correct documents - this will take some experimentation and tweaking. You could instead use the embedding/dot product search method, which could hypothetically be more accurate.
Once you’ve fetched the correct documents, which may still be too long to search with a single GPT-3 prompt, you will need to recursively summarize or distill them, which you can see in my “compress anything” videos for the recursive summarizer. I cannot even take credit for this idea, as a commenter on my video pointed out the utility value of using recursive summarization for recall/fetch purposes.
You’ll still need a check somewhere in here to know whether or not the correct information is even present. But as I’ve demonstrated in other threads, GPT-3 is really good at just giving you a BOOLEAN answer about whether or not something is present or not.

Option 2: Finetune a KB memory bot

This is only hypothetically possible and I have not tried it. GPT-3 is capable of storing quite a lot of information, so it’s entirely possible that you can finetune a model on your 5000 documents and just use that to spit out the correct facts. Such a model would be (1) prohibitively expensive to train and (2) would require quite a bit of experimentation to determine if it’s accurate and viable.

Once I cycle back to research mode, this may be one of my projects. Many, many, many people have a need to search an arbitrarily large knowledge base for facts and figures, and to be able to rely on it. Indeed, even my cognitive architecture would benefit from such a feature. Since OpenAI has now enabled the ability to continuously finetune a model, perhaps this method would not be so unwieldy. Essentially, as your KB grows, you just have incremental training sessions to integrate new information into your model.

As I mentioned, I will be experimenting with this… eventually. It would solve many problems to be able to simply accumulate all of an ACOG’s memories in one model that has the magical ability of instant recall. However, I am skeptical of relying on blackboxes such as this for critical functionality. For example, imagine you have an autonomous agent at some point in the future - you want all of its memories to be explicit and declarative (X happened at Y time) and not just embedded in a model. Kind of like how Tesla’s must keep sensor logs in case of a car crash.

chinmay.duke · June 23, 2022, 7:07pm

reading your book. Thanks for the recommendation. I will try that.

chinmay.duke · June 23, 2022, 7:14pm

I think then OpenAI should update documentation for Embeddings too @daveshapautomator . Because the current documentation makes one believe that the answer is coming from the KB. We know at this point that either there is a bug or the documentation needs revision. adding @moderators for any follow up.

NSY · June 23, 2022, 7:15pm

Great stuff @daveshapautomator.
Based on my own experience, even a fine-tuned Davinci will not be able to answer questions presented in the fine-tuned file, unless it was written multiple times and answered exactly the same way each time. In order to fine-tune a model to answer correctly, there is a massive amount of work to multiply every question again and again, and yet even if temperature = 0, it might invent stuff based on the atmosphere of the answer rather than THE answer.
I really like your idea about checking whether or not the information is there in the middle step. Nevertheless, it requires an additional model just for that and quite a complex architecture. That’s where all the fun is, isn’t it?

daveshapautomator · June 23, 2022, 7:18pm

Which documentation are you referring to? I may be misunderstanding and telling you lies by mistake

daveshapautomator · June 23, 2022, 7:21pm

There are many ways to skin this cat. I suspect that accumulating memories/KB/documents/logs in a search index is probably the way to go. SOLR can search millions of documents in a fraction of a second - plenty fast enough for 99% of use cases.

Actually, that reminds me, I found Milvus but haven’t used it yet. This may be the correct way to go: https://milvus.io/

If someone just figures out vector search + question answering, that alone would be a billion-dollar business. This is the way of the future.

chinmay.duke · June 23, 2022, 7:39pm

Here. Specifically “Text search using embeddings” section. When a question is outside the scope, it should not return any document. But it does. Maybe there is some confidence score that gets returned too and if thats the case, we should just use a “low confidence score” as boolean Yes/No. Thoughts?

daveshapautomator · June 23, 2022, 7:42pm

OH, I see.

I try to avoid blackbox things like that since I have no idea how it works. That’s why I never really used the now-deprecated Answers endpoint.

Personally, I would just story all the KBs and their associated embeddings in a local DB (with 5000, you can easily do this in SQLITE or even just a JSON document). Then when you have a search query, get the embedding for that and do a dot product against all 5000 documents. It will take less than a second and you can just sort by highest dot product.

In my ACOG when I am searching for relevant memories I just grab the 5 or 10 most relevant memories.

chinmay.duke · June 23, 2022, 7:49pm

Thank you. Thank you. Thank you. Such help means a lot when you have told your employer that you would be leaving and working on a new venture while having no clue how the product works. Plus you have to pay for your kids college fee.

chinmay.duke · June 23, 2022, 8:18pm

Why would you pick 5 or 10 top and not the first one?

daveshapautomator · June 23, 2022, 8:29pm

Human memories are squishy. They tend to get compressed over time. This is called consolidation which happens in the background and while we sleep. Alcohol, for instance, disrupts memory consolidation, which reduces learning.

chinmay.duke · June 23, 2022, 8:51pm

So you will try to answer using all these 5-10 top results? If yes then which one will show you to user?

daveshapautomator · June 23, 2022, 9:13pm

I’m talking about an artificial cognition, not a user facing application. I’m merely explaining why I recall the top memories in my ACOG. For your chatbot, it may or may not make sense to do the same.

chinmay.duke · June 24, 2022, 12:59am

I made some more changes in the prompt, as seen below. And now I am getting “Unknown”, as expected. I asked questions such as what is 2+2, what is the capital of USA, What is Node.js, etc and so far it is working.

I am an answering bot with limited knowledge base about a series of web services. I have been trained on a context and if you ask me a question, I will use the provided context to give you the answer. If you ask me a question that is not mentioned in the context , I will respond with “Unknown”. For example, if you ask me how many days in a week and the knowledge base I am trained on does not have this information, I will respond “Unknown”.

NSY · June 25, 2022, 6:26pm

I love this prompt. There should be a prompt “bank” for such situations.

Topic		Replies	Views
How to fine tune a chatbot for Q&A API	12	8423	December 16, 2023
How to prevent Open AI from making up an answer API	14	8085	June 5, 2022
Fine tuning vs. Embedding API	21	45541	December 12, 2023
FAQ on custom data to support company internal API	27	5296	December 18, 2023
Reducing Cost of GPT 4 by using embeddings Prompting	23	10500	May 4, 2023

Embeddings not preventing OpenAI from answering

Option 1: Search Index

Option 2: Finetune a KB memory bot

Related topics