currently,
im using openAI GPT3.5 for models and chroma DB to save vector.
i have some pdf documents which is have 2000 total pages. the pages will increase about 100 pages every day.
and at the end, the total target page that must be inputed is around 20.000 pages.
i have 2 question:
- i worry about token limitations from GPT3.5 . any idea / suggestion to improve token limitations for many source documents?
- for chromaDB, are chromaDB has limitations for saving vector?
i already using text splitter
textSplitter = RecursiveCharacterTextSplitter(chunk_size=1536, chunk_overlap=200,separators=["#####","\n\n","\n","====="])
i accept all idea and suggestion for this, thanks for advice
please take a note: i cant make summary from every documents.
i accept all idea and suggestion for this, thanks for advice
please take a note: i cant make summary from every documents.