Retrieving texts from vector database when the prompt is something like "explain more"!

I build a chatbot to chat with pdf, so the pdf’s text (divided to chunks) is stored in vector database and whenever input new prompt it embedded and semantic searched and retrieve the closest texts to feed them to gpt to generate it’s answer.

So if the texts are about AI for example, and the user input prompt is something like “What is AI?”, the the texts that relate to this question will be retrieved and fed to gpt to make the answer, but what will be retrieved if the user’s next input prompt was something like “explain more”? I mean how to be sure that the retrieved chunks of texts are related to the context when the prompt is general like just “explain more”?

Depends on the engine being used…In davinci it takes previous question automatically as context but if u need gpt turbo u need to feed the prev question also in the context…

I use gpt turbo and I feed it with the user’s input + the conversation’s previous messages + the texts retrieved from Pinecone. My question is about the texts retrieved from Pinecone, I wondering how to be sure that they are relevant to the user’s input if this input is something general like “explain more”?

You have a good question there, and it somewhat depends on the skill of the language model in making a demarcation between prior context and the current question when forming its embedding, something that you can only see the results of after use.

Since the assistant output is very verbose compared to the user input, and may overwhelm it and cause contextual hangups, preventing topic-switching, I would pass only the last three user questions (trimmed if necessary for speed), which should be enough for most followup lines of questioning. Show them as distinct turns, not just a mishmash of sentences.

“How to be sure” can’t really be determined, unless you ask another AI to pass you only the recent user questions that appear to be chained to the topic of the most recent input, or have AI evaluate the quality of the data returned (which doesn’t help remediation).

You are writing in a language that is difficult for non-native speakers of English, So is what you suggest is to put a piece of history with user’s “explain more” as one text to be embedded together so this will retrieve related texts?

Example:

user: can you tell me the weather forecast for tomorrow in New York City?
user: can you explain how photosynthesis works?
user: So, does it occur in all types of plants?
user: does it occur in plant parts other than leaves, such as stems or roots?

||
||
|| embeddings
\/

[0.8321, -0.5467, 0.1898, 0.9764, -0.7253, 0.3189, -0.4802, -0.9089, 0.0432, 0.7776, -0.6523, 0.5421]
||
||
|| find top match? Dot Product(A, B) = a1 * b1 + a2 * b2 + … + an * bn
\/

[-0.8763, 0.2471, 0.6785, 0.1352, -0.5124, -0.8967, 0.7456, 0.9881, -0.2193, 0.4919, -0.6332, 0.0937]
||
||
|| document
\/

Chemical Interactions in Photosynthesis

The chemical interactions within photosynthesis are intricate and elegant, enabling plants to produce their own food in a sustainable and efficient manner. Let’s explore some key chemical steps involved:

  1. Absorption of Light: The process begins when chlorophyll molecules embedded in the thylakoid membranes absorb light energy from the sun. This energy is used to power the subsequent reactions, and different types of chlorophyll molecules absorb light at various wavelengths, maximizing the plant’s ability to harness sunlight.
  2. Splitting of Water: Light energy initiates the splitting of water molecules (H2O) into oxygen (O2) and protons (H+). This reaction, known as photolysis, releases oxygen as a byproduct, which is vital for supporting life on Earth. The protons generated during this stage are used to create an electrochemical gradient that drives the synthesis of ATP.
  3. Formation of ATP and NADPH: The energy obtained from the light-dependent reactions is harnessed to form ATP and NADPH. ATP acts as a high-energy currency, providing energy for various cellular processes, while NADPH serves as a carrier of electrons and reducing power.
  4. Fixation of Carbon Dioxide: The second stage of photosynthesis, the Calvin Cycle, takes carbon dioxide molecules from the atmosphere and combines them with a five-carbon compound called ribulose-1,5-bisphosphate (RuBP). This process is called carbon fixation, and the enzyme responsible for this step is called Rubisco (ribulose-1,5-bisphosphate carboxylase/oxygenase).
  5. Production of Glucose: Through a series of complex chemical reactions, the carbon dioxide molecules are rearranged and combined, ultimately forming glucose. Some of the carbon molecules are also used to regenerate the starting compound RuBP, allowing the cycle to continue.

||
||
|| To AI
\/

role: assistant: Here’s information to answer the following question {document}
role: user: does it occur in plant parts other than leaves, such as stems or roots?
(AI now can answer)