Issue: ChromaDB document and token openAI limitations

diky626 · August 7, 2023, 3:54am

currently,
im using openAI GPT3.5 for models and chroma DB to save vector.
i have some pdf documents which is have 2000 total pages. the pages will increase about 100 pages every day.
and at the end, the total target page that must be inputed is around 20.000 pages.

i have 2 question:

i worry about token limitations from GPT3.5 . any idea / suggestion to improve token limitations for many source documents?
for chromaDB, are chromaDB has limitations for saving vector?

i already using text splitter

textSplitter = RecursiveCharacterTextSplitter(chunk_size=1536, chunk_overlap=200,separators=["#####","\n\n","\n","====="])

i accept all idea and suggestion for this, thanks for advice
please take a note: i cant make summary from every documents.

Foxalabs · August 7, 2023, 6:28am

Hi,

You can always use the GTP-3.5-16k model if you wish have a larger context to use your retrievals in.

You can speak to the nice folks at ChromaDB via their discord here discord.gg/chromadb

Topic		Replies	Views
Facing challenges due to token limitations API	2	51	January 29, 2025
Seeking Advice: Uploading Large PDFs for Analysis with GPT-3 API API gpt-35-turbo , chatgpt , fine-tuning , api	7	6906	December 13, 2023
Big vector DB making prompting impossible API assistants-api	5	151	November 16, 2024
Is there any way by which I can let GPT-4 API summarize large PDF texts? API gpt-4 , api	10	10458	May 6, 2024
Practical Tips for Dealing with Large Documents (>2048 tokens) API	6	8392	December 17, 2023

Issue: ChromaDB document and token openAI limitations

Related topics