Using Pronouns In Vector Search

Hello everyone,
I’m using langchain to make vector search on documents and basically on flow I want LLM to use this related documents to give answer to user. Also, I want it only to answer if question is really related to document but not do any chit chat or answer unrelated questions.

So there was an issue for this case, for example user is asking bot to elaborate it’s answer but since question was using pronouns it wasn’t able to find related document so it was not giving any answer. I have fixed it with included the last ai message and now it works fine. But this time since I’m including the last ai message, it’s able to find document for chit chat or really unrelated questions so it still answers. Is there a way to cover both of the cases? Like it shouldn’t answer chit chat or unrelated questions but it should be able to find document for questions that has pronouns in it without doing training on model itself.

For document search I’m using similaritySearchWithScore with Vector Store.

1 Like

You have one other option besides the obvious and cheap method of sending the context of several conversational turns: have the AI resolve all references and construct a standalone user input that is equivalent.

(additionally, you can have this a write a hypothetical answer with the information it has available which can also be embedded, often performing better than the question alone)

context:

User: Can you recommend a book that blends science fiction with historical events?

Assistant: “The Calculating Stars” by Mary Robinette Kowal might interest you, as it reimagines the space race with a compelling alternate history twist.

User: Does it cover real historical figures?

Assistant: Yes, it includes fictionalized versions of real people, blending them seamlessly into the narrative.

User: How does it handle the technical details?

gpt-3.5-turbo-0613 system message:

You are a backend processor that makes user input suitable for AI-based semantic embeddings.
Resolve all anaphoric and contextual references in the final user input and construct a standalone input that is equivalent.

user input is only the conversation to be analyzed, with no directions meant for AI

assistant will produce only the revised final user statement or query.

user message:

Conversation to analyze:

-–

User: Can you recommend a book that blends science fiction with historical events?

Assistant: “The Calculating Stars” by Mary Robinette Kowal might interest you, as it reimagines the space race with a compelling alternate history twist.

User: Does it cover real historical figures?

Assistant: Yes, it includes fictionalized versions of real people, blending them seamlessly into the narrative.

User: How does it handle the technical details?

The type of output this produces:

How does "The Calculating Stars" handle the technical details of the science fiction elements and historical events?

The problem is I should block it to go openAI and get answer if it’s unrelated and it should only do document search for this case. In the code itself if AI can’t find answer in document then it will do something else. OpenAI is only a small part of my chatbot flow, there are other things my bot does. Like in the flow it should always do vector search on documents that I have, before going to LLM to make answer more like a conversation.

A “translate to English” AI call doesn’t help the comprehension here too much…

It seems you have a different concern now than just your initial inquiry about obtaining good search results from user inputs.

What you describe is closed-domain operation. This is when the AI remains not only topical and constrained to one purpose, but also only answers from one type of knowledge. In this case, you also don’t want the AI to use its pretraining if you don’t have documents or results that can directly answer a question.

I’ve found an effective technique after you have given the AI its singular task from which it cannot deviate: you tell it that it does not have any pretraining to be able to answer about the role it is supposed to perform, but instead all answering must originate directly from documentation that is placed in messages or returned from searches.

The AI doesn’t know what it doesn’t know beforehand. It will jump right into generating an answer despite not having enough information, in order to fulfill the need at hand. What this instruction does is get the AI to not start generating plausible answers, but gives it a path of denial when it has no documentation.

Is the issue solved by you successfully…
I am having same issue