Issue: ChromaDB document and token openAI limitations

currently,
im using openAI GPT3.5 for models and chroma DB to save vector.
i have some pdf documents which is have 2000 total pages. the pages will increase about 100 pages every day.
and at the end, the total target page that must be inputed is around 20.000 pages.

i have 2 question:

  1. i worry about token limitations from GPT3.5 . any idea / suggestion to improve token limitations for many source documents?
  2. for chromaDB, are chromaDB has limitations for saving vector?

i already using text splitter

textSplitter = RecursiveCharacterTextSplitter(chunk_size=1536, chunk_overlap=200,separators=["#####","\n\n","\n","====="])

i accept all idea and suggestion for this, thanks for advice
please take a note: i cant make summary from every documents.

i accept all idea and suggestion for this, thanks for advice
please take a note: i cant make summary from every documents.

1 Like

Hi,

You can always use the GTP-3.5-16k model if you wish have a larger context to use your retrievals in.

You can speak to the nice folks at ChromaDB via their discord here discord.gg/chromadb