Need Help with RAG and Embeddings

I wrote a very simple streamlit app that would upload a document trough Chroma and then let students ask questions of that document. The only issue was that any large document ran into token limits. This had led me on a LONG journey beyond Chroma, and now back, and I think I am close but I am still missing the boat. I know that if I feed my entire vector_store into the prompt I will get a token error so that means that I need to grab the top N results.

I have the following code - note that the retriever in chain.invoke will not work and that is where I need any help you might be willing to provide. Thank you!

docs = st.session_state[“loader”].load()
text_splitter = RecursiveCharacterTextSplitter(
separators=[“\n\n”, “\n”, ". “, " “, “”],
chunk_size=1000,
chunk_overlap=0
)
token_splitter = SentenceTransformersTokenTextSplitter(chunk_overlap=0, tokens_per_chunk=256)
character_split_texts = text_splitter.split_text(”\n\n”.join(doc.page_content for doc in docs))
token_split_texts =
for text in character_split_texts:
token_split_texts += token_splitter.split_text(text)
embeddings = OpenAIEmbeddings(model=“text-embedding-ada-002”)
firstTime = True
for splitted_document in token_split_texts:
if(firstTime):
vector_store = Chroma.from_documents(splitted_document, embeddings)
firstTime = False
else:
vector_store.from_documents(documents=[splitted_document], embedding=embeddings)
sleep(60)
vector_store.persist()
retriever = vector_store.as_retriever()
qa_prompt = ChatPromptTemplate.from_messages(
[
(“system”, qa_system_prompt),
MessagesPlaceholder(variable_name=“history”),
(“human”, “{query}”),
]
)
chain = (qa_prompt | llm)
with get_openai_callback() as cb:
ai_msg = chain.invoke({“query”: question, “context”: retriever,
“history”: st.session_state[‘history’]})
st.session_state[‘costs’].append(cb.total_cost)

1 Like