Load embedding from disk - Langchain Chroma DB

neelmani.gautam · July 7, 2023, 2:14pm

I am trying to follow the simple example provided by deeplearning.ai in their short course tutorial.
As per the tutorial following steps are performed

load text
split text
Create embedding using OpenAI Embedding API
Load the embedding into Chroma vector DB
Save Chroma DB to disk

I am able to follow the above sequence.
Now I want to start from retrieving the saved embeddings from disk and then start with the question stuff, rather than process first 4 steps every time I run the program.

Here are snippets of code that I am using

vectordb = Chroma(persist_directory="embeddings\\")
print(vectordb._collection.count())

The above code prints 188 which means the data is present, but how do I make use of it. Using below code

docs = vectordb.similarity_search(question,k=3)

I get following error
You must provide embeddings or a function to compute them

Any help on how to define the function or suggest the langchain API about embeddings.

pccross38 · July 9, 2023, 5:45pm

I’ve been struggling with this same issue the last week, and I’ve tried nearly everything but can’t get the vector store re-connected after script is shut-down, and then re-connection attempted from new script using same embeddings and persist dir.
I haven’t found much on the web, but from what I can tell a few others are struggling with same thing, and everybody says just go dig into the langchain source code to figure it out.
Wish someone would just give an answer others could leverage

Thiago · July 10, 2023, 2:06am

I just gave up on it, no time to solve this unfortunately.

neelmani.gautam · July 10, 2023, 5:24am

The answer was in the tutorial only. Had to go through it multiple times and each line of code until I noticed it.
Here is what worked for me

from langchain.embeddings.openai import OpenAIEmbeddings
embedding = OpenAIEmbeddings(openai_api_key=api_key)
db = Chroma(persist_directory="embeddings\\",embedding_function=embedding)

The embedding_function parameter accepts OpenAI embedding object that serves the purpose.

Hope this helps somebody

sheena · December 22, 2023, 9:52am

If the embeddings are already saved in the persist directory, then why do we need to mention the embedding again while loading the saved embeddings? Does it use the embedding function again?

db = Chroma(persist_directory="embeddings\\",embedding_function=embedding)

TM199 · January 22, 2024, 1:15pm

Hi sheena. You are right that the embedding function is used again. However, it is not used to embed the original documents again (They can be loaded from disc, as you already found out).

However, when you use the vectorstore to retrieve data that is relevant to a specific query, it is important that the query is embedded using the same embedding function as was used during embedding the original documents. That is why the Chroma Constructor expects the parameter embedding_function (embedding I think it is called in recent versions) here.

Topic		Replies	Views
How to use chroma db as retriever API chromadb	2	3871	May 22, 2024
Does embedding cost me for FAISS save and reteive db from my local drive? API embeddings	2	3211	February 23, 2024
Saving Embeddings API	2	6006	January 27, 2023
Feeding data then ask questions about it API	1	1376	February 28, 2024
When using embedding models, why to ask normal models instead of embedding ones? API embeddings , api	6	1141	April 17, 2024

Load embedding from disk - Langchain Chroma DB

Related topics