OpenAI with Pinecone with a prompt containing multiple questions

In my next.js app I’ve created a PDF loader which creates a vector store via pinecone embedding. The user can ask a prompt about this PDF. Everything works fine, but I can’t make it work that the prompt contains multiple questions. Assuming I have a PDF with a date, an address, some content. I have a prompt like: “Give me the address, the date, question 1 about the content, question 2 about the content” pinecone usually finds me the embeddings of the last question.

import OpenAI from 'openai';

const openai = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY,
});

const response = await openai.embeddings.create({
  model: 'text-embedding-ada-002',
  input: prompt,
});

getting embeddings here.

const namespace = pineconeIndex.namespace(sourceId);
const queryResult = await namespace.query({
  topK: 5,
  vector: embeddings,
  includeMetadata: true,
});

It works all fine if I have an array of prompts, e.g.
input: [‘Give me the address’, ‘Give me the date’, ‘question 1’, etc], but not if I input the whole prompt.

The prompt’s contained questions can come in different formats, they don’t usually end with a ‘?’ so I can’t just split the prompt into an array of prompts.

What’s the best way to handle multiple questions in one prompt with pinecone?

1 Like

Sounds like you are trying to query using an embedding of all questions at once. That’s unlikely to work. You need to split the query up and compare vectors of each individual query as you have optionally suggested.

Alternatively you can implement this using a local search function shared with the LLM and use its discretion to call the search function as many times as it deems necessary.

1 Like

Not sure I understand. Are you saying I should use an LLM to split the multi-question prompt into more meaningful array of prompts? I already have some slowness issues (pinecone request, and the final openai query with the context received from pinecone). That would basically mean a third query.

1 Like

So this is more a question on : how can I effectively implement a search

Here is how I have solved this in the past:

  1. When embedding using Pinecone, I try to do a 7 to 12 sentence cluster with 3 sentences on each side (there are other methods etc)
  2. when asking a more broad question, I usually prefer to set the TopK to about 15-20
  3. if there are X (can be 2, or 8) questions, the TopK fetches the most relevant onces and there is a high chance that you get the right response back in those 15-20 chunks
  4. play around with the topK a bit
  5. finally, I have a post-processor (usually GPT3.5Turbo or GPT 4) which takes the 20 chunks and spits out the response. In that prompt, I usually add : if there were multiple questions, split each answer in different lines.

Hope this helps.

Yes, I am. If you inform the LLM about a function that will carry out the query, it can perform a function call.

The LLM can be sophisticated enough to split up your natural language paragraph into several individual function calls.