I am trying to follow the simple example provided by deeplearning.ai in their short course tutorial.
As per the tutorial following steps are performed
load text
split text
Create embedding using OpenAI Embedding API
Load the embedding into Chroma vector DB
Save Chroma DB to disk
I am able to follow the above sequence.
Now I want to start from retrieving the saved embeddings from disk and then start with the question stuff, rather than process first 4 steps every time I run the program.
I’ve been struggling with this same issue the last week, and I’ve tried nearly everything but can’t get the vector store re-connected after script is shut-down, and then re-connection attempted from new script using same embeddings and persist dir.
I haven’t found much on the web, but from what I can tell a few others are struggling with same thing, and everybody says just go dig into the langchain source code to figure it out.
Wish someone would just give an answer others could leverage
The answer was in the tutorial only. Had to go through it multiple times and each line of code until I noticed it.
Here is what worked for me
from langchain.embeddings.openai import OpenAIEmbeddings
embedding = OpenAIEmbeddings(openai_api_key=api_key)
db = Chroma(persist_directory="embeddings\\",embedding_function=embedding)
The embedding_function parameter accepts OpenAI embedding object that serves the purpose.
If the embeddings are already saved in the persist directory, then why do we need to mention the embedding again while loading the saved embeddings? Does it use the embedding function again?
db = Chroma(persist_directory="embeddings\\",embedding_function=embedding)
Hi sheena. You are right that the embedding function is used again. However, it is not used to embed the original documents again (They can be loaded from disc, as you already found out).
However, when you use the vectorstore to retrieve data that is relevant to a specific query, it is important that the query is embedded using the same embedding function as was used during embedding the original documents. That is why the Chroma Constructor expects the parameter embedding_function (embedding I think it is called in recent versions) here.