Sending large document via API call and asking for a question over complete document?

shripati007 · February 26, 2024, 11:10am

Hi Everyone,

I have a very huge PDF document ,of which, when I extract the contents using PyPDF library and send it over the chatCompletion API in GPT 4.0, I get ‘Tokens exceeded’ error. I think max token size allowed is around 8000 tokens approximately, in the type of the GPT 4.0 that I am using.

Is there a way, if I break this document into multiple chunks, to send these chunks in a sequence through API, so that the model can remember the complete document ,at the end, and then I can ask a particular question based on this complete document through ChatCompletion API?

P.S. I am new to this technology and these APIs. So kindly excuse me if this seems to be a basic question.

jr.2509 · February 26, 2024, 3:26pm

Hi @shripati007 - welcome! Before looking at more complex solutions, is anything holding you back from using the GPT-4-turbo model, which has a context window of 128k (equivalent to 300+ pages)? Given you are already using GPT-4, this would likely be the easiest solution.

As regards your idea with chunks: the API would not be able to remember the individual pieces, so every call would be treated separately. However, there are (likely) still ways for you to generate an answer through this iterative mechanism in the event that you can’t switch to GPT-4-turbo.

shripati007 · February 26, 2024, 4:24pm

Hi @jr.2509 - Thanks for your useful response. The thing is we would soon receive the access for the GPT Turbo with 128k tokens, that you mentioned, and it might largely address the current document size that I have. But the challenge is, the size of this document is expected to grow every quarter, with incremental addition of new content. So I was looking for a long term and viable solution which can help in scaling this growing document size over a period of time. If you have any such solution in mind, please let me know. Thanks again!

jr.2509 · February 26, 2024, 4:37pm

Understood.

Roughly speaking I see two routes:

Using embeddings: you could chunk the document and then create an embedding for each chunk and then use embeddings-based search to obtain an answer to the question… If the base document stays the same, then you could just create embeddings for the incremental additions that you add to your existing embeddings. If the whole document changes, you’d have to re-create the embeddings over and over again.
The chunking approach whereby you feed a chunk of the document as part of the prompt along with the question. You then keep the answer as a variable and feed it into the next API call as part of the prompt, asking the model to refine the answer based on the new information (i.e. your next chunk). You repeat that process with all your chunks and in the end you should have an answer that has considered all the information. There’s a risk that you may lose some context, but that somewhat depends on the nature of your information and questions.

Topic		Replies	Views
Seeking Advice: Uploading Large PDFs for Analysis with GPT-3 API API gpt-35-turbo , chatgpt , fine-tuning , api	7	7020	December 13, 2023
Chained Prompt to complete text larger than 4000 tokens? API	14	6035	December 25, 2023
PDF summarizer using openai API	22	16978	January 2, 2024
How to build a Question and Answer Bot for context greater than 2048 tokens? Prompting	3	1749	December 17, 2023
Long Prompt with Large Text Data Prompting gpt-35-turbo , chatgpt , api	3	13051	July 14, 2023

Sending large document via API call and asking for a question over complete document?

Related topics