I came across the website “chatpdf .com” and noticed its capability to upload PDF files, allowing the GPT model to analyze content, retain information, and provide answers to queries regarding the documents.
In my attempt to replicate this functionality, I tried transcribing PDFs and uploading the transcriptions to the GPT-3 API using both Chat Completions and Completions. I divided the documents into segments to avoid exceeding the token limit per message (using Chat Completions). However, when I attempted to upload a substantial book, I encountered an error.
As I am posing this question on the forum, I currently do not have access to my code, and I am unable to recall its current status. After some research, I realized that I might have been uploading too much text at once. It seems there is a limitation on the number of tokens per context. This led me to wonder how the mentioned website manages to upload a significant amount of information without surpassing the token limit within a conversation.
I have been grappling with this issue for quite some time and have not come across any definitive solutions. Would the optimal approach involve fine-tuning? Honestly, I’m unsure. Could someone provide guidance on this matter? I would greatly appreciate any assistance.
“Chat over documents” is generally handled by splitting the document into chunks, creating embeddings on the chunks, and storing them in a vector database. Then on a user query, create embedding of the query, perform a similarity search in the vector database to retrieve the chunks, then put those relevant chunks in the prompt. There’s a bunch of tutorials on YouTube and elsewhere on how to do this. LangChain is likely the easiest way to get started with a proof-of-concept (although wouldn’t recommend using it for a large-scale production app).
I totally understand this logic. However, what if a company wants to chat with their documents but do not want to store text along with the embeddings in the vector database like Pinecone? So basically, only vectors of text will be stored in Pinecone.
What is the proper approach for this use-case now that text cannot be embedded in the prompt?
Well, you need the text to be “somewhere”. I’m assuming you wish to keep the text itself on your internally secure databases, in which case you would make an index lookup table for your data and embed that index’s value along with the vector, you can then lookup that index locally and recreate which text block created that vector.
so in the end in any case, the LLM must receive the text of relevant docs in the prompt so that it forms its answer, right?
Yes, the models are “stateless” so they start every API call with no knowledge of the prior conversation. Think of starting a conversation with a new stranger every time you speak, the only way you get them to understand things you’ve said in the past is to tell them everything each time, same with the GPT language models.
You can use a private instance of Weaviate database on your servers to securely store the data.