Hi, I’m using OpenAI API and Pinecone as my vector database. Here is the scenario: My Pinecone database will eventually contain up to 30,000+ records where each record contains details about a vehicle spare part. Then, I use the OpenAI API to just basically query generic and specific information from the database.
Say, my initial prompt is: “Give me all brake parts from all types of vehicles”. Currently, my process is that:
(1) The API converts my prompt to an embedding and sends it to Pinecone.
(2) Pinecone returns all records it thinks are relevant to the prompt. Pinecone may return 5,000 records, for example, that are related to the prompt.
(3) Now that I have the records on hand, I then submit another prompt that asks for more specific information such as “Based on the records below, which vehicles have the most requested brake parts? [then all 5,000 records are then appended to the prompt here …]”.
(4) My app then processes the response from OpenAI API.
a. My 1st question is: I wanted to know if the above steps are fine or normal?
b. My 2nd question is: For step (3) and any further prompts, is having to append all 5,000 records each time in the prompt the only way to gain more insights into the data? The size of the request consumes too many tokens and I’m wondering if there was a way to improve this process or make it cost-efficient.
Ensure you setup prompts to ensure the AI model understands the dataset it’ll get from pinecone.
Ensure you use the same embedding model you used to upsert pinecone content for the query embedding as well. That way, you don’t have to specifically prompt the model for sorting. You can use similarity search for this instance
Also use a a framework like langraph to get better results and set up nodes.
This is the approach I used for a similar project.
Generally, yes. It makes sense to be CAPABLE of traversing through your database using different layers of granularity.
Although. You can combine your query all together instead of iterating over it.
It’s important to remember that embeddings are for unstructured text. If you have the benefit of having structured information then using an LLM to form database queries would be better.
Hi johnnonso090, thanks for the valuable response. Yeah, that’s eventually the end goal of the app I’m working on. It looks like there is currently no way of preserving context in RAG, at least in my set up, other than having to append the data each time in the prompt. So, it seems that determining beforehand and appending only the most relevant data in the prompt is the way to go.
Just be careful to safeguard the resulting sql to prevent data manipulation (or deletion).
Also, you will have to provide the table structure and might help with some example SQL-queries to guide the AI.
Referencing this “It looks like there is currently no way of preserving context in RAG”, the model that does the processing is not trained to have any knowledge of you data. so that is why context are very important, so it knows what to work with based on the PROMPT + CONTEXT. think of this as light database info you send for it to work with.
also for cost. you might wanna look at prompt compression tools to significantly help reduce prompt cost.
Using some database wrappers like supabase or Directus might also be of help, as they provide restful API which is easier to understand by the model and you can control what endpoints and operations are allowed to avoid the risks of model screwing up your database with errors in the SQL.