Token use on Langchain PDF reader

The below code enables me to produce answers on a PDF document (33 pages). However, it appears to have swallowed up my tokens very quickly. The responses were also not very accurate. Any advice on how to improve this (change my chunking strategy) or is there an alternative to Langchain that would produce better but also more cost-effective results?

from langchain.document_loaders import PyPDFLoader
from langchain.indexes import VectorstoreIndexCreator
from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores import Chroma
import os 
from langchain.text_splitter import RecursiveCharacterTextSplitter

os.environ['OPENAI_API_KEY'] = '___'

loader = PyPDFLoader("data/33.pdf")
data = loader.load()

text_splitter = RecursiveCharacterTextSplitter(chunk_size = 500, chunk_overlap = 0)
all_splits = text_splitter.split_documents(data)

vectorstore = Chroma.from_documents(documents=all_splits, embedding=OpenAIEmbeddings())

question = "What were the facts of the decision in the case of McDonald v Chelsea?"
docs = vectorstore.similarity_search(question)
len(docs)

from langchain.chains import RetrievalQA
from langchain.chat_models import ChatOpenAI

llm = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0.7)
qa_chain = RetrievalQA.from_chain_type(llm,retriever=vectorstore.as_retriever())
result = qa_chain({"query": question})

print(result)

I am not an expert on this but have used the same code from a tutorial recently: I think your high token usage results from embedding the whole document anew every time you run the code. You can check that in the “Daily Usage Breakdown”. If you next question is how to save the embeddings to disk and retrieve from there instead, I am trying to learn that as well currently :smiley:
I’d figure improving accuracy is what everyone is currently working at, so there might not be an “easy” solution yet. Just mess around with prompting, chunking, cleaning your pdf better etc. Also consider using GPT-4, shorter questions (more precise) or more “chains” inbetween that split your long (complex) question into a shorter one. That helps in my experience.