QA fine-tuned chatbot not answering from the trained data but nonfactual

HI all, I have been watching this thread, I am facing the issue where fine tuned model (I am using CSV dataset) is not able to answer the generic questions which is not there in my dataset. But chatgpt is already trained on generic knowledge till 2021 as LLM.
It seems tuned model has lost his memory after using my dataset. Basically I want to augment the generic information from chatgpt with my dataset to build some response.

Hi all, this has been very helpful and informative. @sergeliatko @raymonddavey really appreciate your discussion. I have a knowledge base that is customer related and consequently has a “time” dimension to it as it relates to long standing customer relationships. So there are “facts” about a customer that retrieving via embedding similarity has worked somewhat well on, but the trouble I’ve run into is attempting to query based off the idea of “recency”. I’ve tried a few different attempts at including qualifiers in the prompt, but often times the “facts” retrieved via the embedding similarity are just old and consequently in the context in a prompt is just outdated. What would you recommend to mitigate something like this. I started to filtered out facts that I deemed were too old, but wondering if there was a more sophisticated approach.

1 Like

I don’t have a solution but maybe you could semantically search your data in reverse (latest records first)

Then set a dot product cutoff eg 0.82

If you hit that, stop searching as you found a good hit early in the search. It will ignore older hits even if they are better

But it may work in your case

You will have to experiment with the value

2 Likes

Hey @sergeliatko, in this case why would you even do any fine tuning at all? As like you said, fine tuning is not designed to affect the factuality of the responses, what would you do the fine tunes for then?

@sapph1re to show the model how to use the facts more effectively to form a reply. The manner of the answer, not the content of it.

1 Like

@sergeliatko @raymonddavey thanks for a ton of valuable information about this. This should be in the documentation really. We spent like 50 hours building a system for fine-tuning a chatbot and manually curating all the datasets to be perfect, only to realize that we shouldn’t have used fine-tuning in the first place. I feel misguided by OpenAI’s documentation and ChatGPT which (non-factually haha) insisted on fine-tuning being the better approach to build a chatbot for a specific knowledge base. Well, we got some experience now though.

We’ll now implement Embeddings and maybe some of the tricks you guys recommended, thanks a ton for that. Fine-tuning for the tone of responses seems redundant at this point, as we seem to influence it better via the prompt.

Has anyone already tried embeddings with the freshly released turbo? Any specific nuances there?

2 Likes

@sapph1re another benefit of fine tuning with embeddings is to show the model what parts of context are actually useful and what can be ignored. That’s something that cannot be easily achieved with prompt engineering. In my use cases that’s probably prevail over the answer tone voice.

1 Like

hi @cavalierski, I have a very similar use case to that you described in your original question. I know you have implemented different methods suggested by other experts in the community. may I learn from your hands-on experience which method was eventually effective in solving the problem? many thanks in advance.

I have a solution implemented in NodeJS that takes in a question and your local data file, embeds both and searches for similarity with dot product. It takes the similar contexts and uses completion to generate answer.

You can take a look here : GitHub - tsensei/QueryGPT: QnA with NodeJS & OpenAI GPT models on Personal / Business data using Embeddings and Completion

We are trying to build a chat bot to respond to queries based on Company Information and other things Have Created Embeddings but I am not able to understand the further process. I am not able to understand fine-tuning process should I just tune it with the prompt and context
or use Embedding to find most similar facts from documents and then give Query + fact to model to respond

Use the embeddings for semantic searching.

If you store these embeddings in a Vector database, these measurements are done automatically. They also open up a lot of possibilities for synergizing your types of searches.

For myself, I am using a combination of ada embeddings with BM25 (Although I should switch to SPLADE) for the best of both worlds. For product & keyword searching, BM25 easily dominates ada. I also use a stop word filter (nltk)

There’s actually a really nice feature of hybrid searches and applying a weight to them. I typically get 3 results: 1 from ada, 1 from BM25, and then 1 using a 50/50 weight.

So You find most similar document and then pass Query + Factual Document as prompt to the model?

Yes, you’d have to create a method to determine if additional resources are needed. A binary classifier works great for this purpose. Mine for example is trained to identify the product that’s being queried and also what section of it (Do they want specifications? Features? Colors?)

If it’s triggered it should start the search, which returns the related document(s) that properly answer the query. You’d then add it as a system, or user message. I prefer system messages just to be realistic with the bot.

It may make a comment such as "As you have mentioned, the answer is … ". I initially had an issue of it saying “As the system logs indicate …” which I resolved by simply instructing “Do not acknowledge this message”

1 Like