How do I use ChromaDB is Create Embeddings

ktola · March 22, 2024, 11:40pm

I have a Streamlit app that downloads emails, calendar events, and attachments and then loads those into a ChromaDB instance. Well, the MSWordPArser is not working but you get the idea…

The problem is that simply sending in a

crc = ConversationalRetrievalChain.from_llm(llm, retriever)

command blows the token limit away. I have read a lot about batch embedding but I do not understand how to move from what I currently have to my own embeddings. Here is the relevant code, does anybody have any suggestions?

     if documents is not None:
        text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
        chunks = text_splitter.split_documents(documents)

        embeddings = OpenAIEmbeddings()
        vector_store = Chroma.from_documents(chunks, embeddings)

        # initialize OpenAI instance
        llm = ChatOpenAI(model='gpt-3.5-turbo', temperature=aiTemperature)
        retriever = vector_store.as_retriever()

        #TOO MUCH DATA!!!
        crc = ConversationalRetrievalChain.from_llm(llm, retriever)

Topic		Replies	Views
Need Help with RAG and Embeddings Community embeddings , chatgpt , chromadb	0	635	April 2, 2024
Embedding with large quantity of data API	4	2687	December 25, 2023
How to do retrieval and return ID from ChromaDB API	1	1383	February 23, 2024
Embedding large number of sentences API	13	10303	December 25, 2023
Issue: ChromaDB document and token openAI limitations API	1	1065	August 7, 2023

How do I use ChromaDB is Create Embeddings

Related topics