OpenAI with Pinecone with a prompt containing multiple questions

klabusta · February 23, 2024, 8:48am

In my next.js app I’ve created a PDF loader which creates a vector store via pinecone embedding. The user can ask a prompt about this PDF. Everything works fine, but I can’t make it work that the prompt contains multiple questions. Assuming I have a PDF with a date, an address, some content. I have a prompt like: “Give me the address, the date, question 1 about the content, question 2 about the content” pinecone usually finds me the embeddings of the last question.

import OpenAI from 'openai';

const openai = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY,
});

const response = await openai.embeddings.create({
  model: 'text-embedding-ada-002',
  input: prompt,
});

getting embeddings here.
…

const namespace = pineconeIndex.namespace(sourceId);
const queryResult = await namespace.query({
  topK: 5,
  vector: embeddings,
  includeMetadata: true,
});

It works all fine if I have an array of prompts, e.g.
input: [‘Give me the address’, ‘Give me the date’, ‘question 1’, etc], but not if I input the whole prompt.

The prompt’s contained questions can come in different formats, they don’t usually end with a ‘?’ so I can’t just split the prompt into an array of prompts.

What’s the best way to handle multiple questions in one prompt with pinecone?

merefield · February 23, 2024, 9:20am

Sounds like you are trying to query using an embedding of all questions at once. That’s unlikely to work. You need to split the query up and compare vectors of each individual query as you have optionally suggested.

Alternatively you can implement this using a local search function shared with the LLM and use its discretion to call the search function as many times as it deems necessary.

klabusta · February 23, 2024, 11:03am

Not sure I understand. Are you saying I should use an LLM to split the multi-question prompt into more meaningful array of prompts? I already have some slowness issues (pinecone request, and the final openai query with the context received from pinecone). That would basically mean a third query.

idonotwritecode · February 23, 2024, 11:28am

So this is more a question on : how can I effectively implement a search

Here is how I have solved this in the past:

When embedding using Pinecone, I try to do a 7 to 12 sentence cluster with 3 sentences on each side (there are other methods etc)
when asking a more broad question, I usually prefer to set the TopK to about 15-20
if there are X (can be 2, or 8) questions, the TopK fetches the most relevant onces and there is a high chance that you get the right response back in those 15-20 chunks
play around with the topK a bit
finally, I have a post-processor (usually GPT3.5Turbo or GPT 4) which takes the 20 chunks and spits out the response. In that prompt, I usually add : if there were multiple questions, split each answer in different lines.

Hope this helps.

merefield · February 23, 2024, 11:32am

Yes, I am. If you inform the LLM about a function that will carry out the query, it can perform a function call.

The LLM can be sophisticated enough to split up your natural language paragraph into several individual function calls.

Topic		Replies	Views
OpenAI Embeddings - use case Community embeddings , gpt-35-turbo , chatgpt , api	30	4180	October 31, 2023
Open AI prompts for RAG / doc Q&A API api	11	7077	January 9, 2024
How can I send vectors as a chat context? Prompting embeddings	8	9000	May 15, 2023
Prompt context formatting best practice Prompting api	7	3650	July 24, 2023
Is there any sample code to split a json file into smaller chunks? API	11	14121	October 26, 2023

OpenAI with Pinecone with a prompt containing multiple questions

Related topics