Is there any way by which I can let GPT-4 API summarize large PDF texts?

Hey OpenAI Community,

I’m currently working on a project aimed at creating an intelligent PDF reader by integrating the powerful capabilities of the GPT-4 API. However, I’ve encountered some challenges along the way and would greatly appreciate your insights and advice.

Problem Overview:

  • I’m designing a smart PDF reader that needs to process approximately 60-70 PDF documents on average.
  • Each PDF typically consists of 10-12 pages, primarily research articles, with an average of 9000 tokens per document.
  • Additionally, I have a prompt with a limit of 2000 tokens to guide the GPT-4 model in generating the desired output.

what i want is send the pdf content (extracted text) in parts using API and i want it to remember all the parts and then give me overall output as said in prompt.
like i could send a PDF part using my tokens then again until my pdf content ends. after that i should send my criteria along with prompt and it should read all the parts and then give me answer
its more related to making memory in GPT-api

Hi @Hodor -

Welcome to this Forum and thanks for your question.

Before providing some guidance and/or links to relevant other threads, can you just clarify. Are you looking to create a summary of all the PDFs or are you looking to obtain answers to questions based on the PDFs?

1 Like

a single pdf that can contain text (tokens) more than 8000. that’s why I want to send them in parts. the only thing I want is while summarizing Api should remember all the parts sent to it earlier.

thanks for clarifying.

Unfortunately, every API call is treated separately - there is no “memory function” or some such. Depending on the specifics of what you want your summary to look like there are different approaches that you consider.

Option 1 involves summarizing all PDFs individually and then creating an aggregate summary that combines the individual summaries.

Option 2 would be a model whereby you summarize one document and then include the summary of that document as additional context for the summary of the next document. This way you are establishing some relationship between the summaries.

When you use one of the models with a longer context window, i.e. GPT-4-turbo models, you of course have the option to include multiple documents at a time for summarizing reducing the number of API calls required to summarize all 60+ PDFs.

I’ll dig up some additional threads discussing summarization approaches shortly.

EDITED: Here’s a thread on the topic that you may find helpful - might add others later:


The AI model cannot “remember parts” greater than its maximum context length. There is no memory beyond what you can send in a single API call.

Even the Assistants endpoint, which is an agent that can use PDF document for searchable knowledge, doesn’t have an overall view of the document. It has a search function based on user input.

You would want to use your own document text extraction on the PDF file. Pass the text in sections to obtain summaries of sections that reduce the size. Then summarize based on all of the summaries, which will result in a product about 600 tokens by training.

GPT-4-turbo models -1106 and -0125 can accept input context length up to 125k, which has to be reduced a bit to allow the tokens for formation of a response.

1 Like

GPT4 128k model supports about 200+ pages of information. This would solve your issue. but its pricey per question.

a better solution would be to build a memory system. it is possible to build a system that can understand the whole doc based on a question but processing time is greater to get it.

this involves chunk summarizing down to your memory size. So you pick the size of a model context you want to work in, than look at your data size that will give you an idea on max size per chunk you maybe able to handle. the idea is you take the query and the ai than divides up the data into parallel processing firing all chunks off at once with the same query. the ai is given instructions on what it is doing so it understands its only dealing with small data chunk at a time. you than take the each response back into another ai once all processes are done and that ai than knows pretty much what you want. running in parallel allows the processing time to be minimal. allowing the messages to get back in a good time frame. This is just a basic thought for you, I wont give you my full design but that may help. vector database work similar.

Some other things you can do is use the intelligence of the ai and its understanding of index and table of contents if you want to try to build more intelligent design.

for multi pdf’s you build a librarian ai to handle the library for your ai processing system. next think you know your like me with so many ai subsystems doing alot of intelligent things in the back end so the front end moves fast :slight_smile:

There is no way to pass a file into API call unless it is via Assistants API and therefore subject to retrieval limitations? Currently 20 chunks. Great for Q & A but not so much for summarization tasks.

I’m also wanting to do similar but don’t want to deal with vector stores.

1 Like

@cagey yes you can but not as the file itself. I am talking about taking a file and breaking it apart to get the text out vs calling an assistant. :wink: my system uses GPT-3.5 turbo and can do any size dock on 16k so its fully possible without vectors.

But my V6 memory system I am building will also include embeddings soon on top of what I have for even faster understanding in its design.

Assistant works good, but I found that I could do it cheaper with the GPT3.5 using my memory system design.

Is there any AGI providers that provide “unlimited” context length? Well, large PDF texts are always the Achilles heel of AI generative functions.

What do you mean by using your memory system design with GPT 3.5?

Googles has a million , but have not used it so can’t tell you anything about it. used the phone app version that was free but not with anything of value.

simply put I built my own data structure using Graph databases that run local and they are my unlimited long term memory for my personal gpt client. you can find it in the forums called it can take voice inputs, has its own message application , has a doc ingester to fill out its brain with manuals etc. its still in development and has been running for almost 2 years. It even use to play games and twitch stream with me and my followers and remembers them all etc. uses the GPT3.5 model to keep the costs fractional so I can chat with it all day on my data and have only spent less than lunch :slight_smile: The next version 6 will have more understanding and its approaching LLM stacked on openai GPT. Using Machine learning and the likes. my Discord link in the forums goes to my project where I discuss changes I did with out giving away everything but you can see from the start all the way through each memory revisions changes and choices I made to get to where it’s at. took less than a few months with V1 and 2 years to get to start ov V6. V5 currently runs but is not the cheapest and smartest way to do it, which is why the next version comes in giving it machine learning and embedding with relational points like a neural network.

elcome to OpenAI Developer Forum — thanks for contributing!

  • Be kind to your fellow community members.
  • Does your reply improve the conversation?
  • Constructive criticism is welcome, but criticize ideas, not people.

For more, see our community guidelines. This p