How to send long articles for summarization?

sunco007 · May 12, 2023, 7:29pm

Hello, I work at a newspaper and we are testing the OpenAI API.

Currently, we are using both the text-davinci-003 and the gpt-3.5-turbo models and they are working for us.

We are using the API to generate summaries of our articles. The problem we’re having is that sometimes the articles are too long and we hit the token limit, which happens about 1 out of every 100 articles.

I read that it’s possible to send pieces of text to the API. I also saw some libraries that can split the text, but I don’t think I need them. I found a library that generates the number of tokens, which seems useful. However, they don’t mention how to send the tokens. I also found some websites that can split the text.

I saw in one place that we can use the chat endpoint and in another that we can use the embed endpoint. In yet another place, I saw that we need to obtain a context ID.

Is there a guide on how to do this? I am working in PHP, but I can read/understand code in any language.

riyan.arkademi · May 12, 2023, 8:06pm

i have the same problem. the problem is limited token. how to resolve this?

SomebodySysop · May 12, 2023, 8:14pm

Here are some notes I have made on the issue. I have used the first strategy, Map Reduce, with success.

How to Summarize a Large Text with GPT-3
How to Summarize a PDF file with ChatGPT (70 000+ Words)
State of the Art GPT-3 Summarizer For Any Size Document or Format | Width.ai
- Smaller chunks allow for more understanding per chunk but increase the risk of split contextual information. Let’s say you split a dialog or topic in half when chunking to summarize. If the contextual information from that dialog or topic is small or hard to decipher per chunk that model might not include it at all in the summary for either chunk. You’ve now taken an important part of the overall text and split the contextual information about it in half reducing the model’s likelihood to consider it important. On the other side you might produce two summaries of the two chunks dominated by that dialog or topic.
Building a Summarization System with LangChain and GPT-3 - Part 2 - YouTube
- “Extract the key facts out of this text. Don’t include opinions. Give each fact a number and keep them in short sentences.”
- Fact check summaries.
Building a Summarization System with LangChain and GPT-3 - Part 1 - YouTube
- Summarization Methodologies
  - Map Reduce
    - Chunk document. Summarize each chunk, then summarize all the chunk summaries. Using this currently in embed_solr_index01.php.
  - Stuffing
    - Summarize entire document all at once, if it will fit into prompt.
  - Refine
    - Chunk document. Summarize first chunk. Summarize 2nd chunk + 1st chunk summary. Summarize 3rd chunk + 1st and 2nd chunk summary. And so on…

codie · May 12, 2023, 8:24pm

To add on to @SomebodySysop’s list you can check out this one.

LangChain101: Question A 300 Page Book (w/ OpenAI + Pinecone) - YouTube

That youtuber specifically does a lot of stuff with langchain.

arturkre · May 12, 2023, 10:10pm

i was not happy with the map reduce function of langchain as summery of a summery just lose too much detail and the overall output is still too short because of the capped output length

Try out this method GitHub - emmethalm/infiniteGPT: InfiniteGPT is a Python script that lets you input an unlimited size text into the OpenAI API. No more tedious copy & pasting. Long live multithreading!
it creates a summery for smaller chunks and then it just append all together. This way your output is much longer and you have more detail

SomebodySysop · May 12, 2023, 11:11pm

That’s interesting. So, instead of a summary of summaries, you submit a list of summaries. Apparently, it works!

In my case, Map Reduce works because I am attaching the summary of the parent doc to it’s chunks in order to give them more relative context.

If I were looking for another way to chunk large texts, this does sound like a pretty good approach.

sunco007 · May 13, 2023, 8:00pm

Thank you for your responses, I am reviewing them to see which one fits my needs best.

Asking Chatgpt, it told me that I can use the append method and then the generate method from the following endpoints:

https://api.openai.com/v1/engines/davinci-codex/completions/append
https://api.openai.com/v1/engines/davinci-codex/completions/generate

Is this correct or are they no longer available?

jwatte · May 13, 2023, 10:20pm

None of those functions would help with trying to extend the token limit. The limit of “size of input” plus “size of output” is a hard limit.

What I do, is setting aside a number of tokens (say 300) for the response, then going through each input document, adding one paragraph in turn to a “current document piece,” until adding the next paragraph would go over the limit. Then I don’t add that paragraph, but instead send the document so far in for summarization. Then I replace the input document I had with the summary, and keep adding more paragraphs from the input. If I hit the limit again, or the end of input document, I send what I have in for summarization again. Yes, this means that you get “a summary of a summary” in the end, but it handles documents of any size.

codie · May 14, 2023, 3:57am

There is a 1 million token limit method for LLMs. I don’t know how compatible it is with OpenAI’s but I suspect there will be a race to test this out. At the rate things are advancing, I’d say give it a couple of months and you won’t have to worry about that limit.

SomebodySysop · May 16, 2023, 12:35am

Yes, it sounds great. I’ve got 8K now with GPT4 and looking forward to 32K. But then, realistically, if the 90% of the query responses I am looking for can be found in 1 or 2 paragraphs, is it really helpful to feed the LLM 50 pages of text for each query? And, isn’t that doing to get prohibitively expensive? I mean, a million token context window will be great for summarizing a book, but how great will it be for finding the paragraph where Huckleberry Finn and Tom Sawyer first encounter Jim in Mark Twain’s “Huckleberry Finn”.

sunco007 · May 16, 2023, 1:10am

Hello and thank you for your responses.

In the end, I used the idea from this image: https://www.allabtai.com/wp-content/uploads/2022/12/big-file-summerize-gpt3.jpg

The difference is that the image suggests sending the paragraphs and then combining all the summaries at the end.

Here’s what I did:

Combine the paragraphs based on the number of tokens.
Send them to create a summary.
Combine the previous summary with the following paragraphs based on the number of tokens.
Send them to create a summary.
Go back to step 3.

This way is better because in each generation, I send the context rather than just isolated blocks of paragraphs that may not make sense with the complete text at some point.

SomebodySysop · May 16, 2023, 2:03am

In other words, the “Refine” method.

Building a Summarization System with LangChain and GPT-3 - Part 1 - YouTube
- Summarization Methodologies
  - Refine
    - Chunk document. Summarize first chunk. Summarize 2nd chunk + 1st chunk summary. Summarize 3rd chunk + 1st and 2nd chunk summary. And so on…

wolfgeppert · May 16, 2023, 4:19am

I am running that big text system already and splitting the text into chunks. That is the easy part… The problem is api cost and time for content like 500 pages or 100.000 words. To summarize the summaries of all chunks is working well. You have to calculate the maximum size of a chunk summary to fit into maxTokens at final summary. But there is a second option I didnt test on the api but on chatGPT. When you upload the chunks you get a requestID. In fact the model holds all content in memory for your session. I will try to ask for “Summarize requestID 1 to n.” I let you know.

codie · May 16, 2023, 7:54pm

While I don’t know how well it preforms, I think it will do really well for complex tasks. If you think about it right now, in order to get the human like responses, you have to go through a series of prompt chaining to get those really powerful complex responses. A 1 million token limit would greatly limit the complexity of the agent built to respond in a certain way.

I think the real cutting edge stuff it going to be expert help and tasking. In order to do that effectively, the agents need to know quite a bit about your problem and current situation. For example if I need help researching circuit designs or whatever, it would be nice to stuff the prompt with a bunch of contextually relevant data from papers, books, videos, and possibly the work I have been doing up until now. You can do that now to an extent, but in a very limited manner.

One last point, I think the really high efficient token counts will actually drop the price pretty drastically.

SomebodySysop · May 16, 2023, 8:05pm

Yes, I am beginning to see how this could be a tremendous leap forward. Just the fact that the larger token window will let you submit MORE documents (not just larger documents) to evaluate.

However, for this to be truly useful for the general public, the pricing is going to have to go WAY down. I mean, 8K is 6 cents per 1K and the 32K window is 12 cents per 1K tokens.

I have GPT4 8K access now, but just reverted back to GPT-3.5-turbo 4K window for testing because of the cost.

codie · May 16, 2023, 8:13pm

Yeah 100%. I’m working on a project right now, I have built a ModelPromptObject, which is based off of the DataAccessObject programming pattern. The idea is that I can easily switch between models and APIs, hopefully at some point an inhouse computer running a model. I suspect both Microsoft and OpenAI are greatly overcharging with this token count paradigm they have setup.

Anyone in AI knows that inference is the cheapest part of the whole thing and I think they are probably making quite a bit off of it. The fact that Azure didn’t raise the price compared to OpenAI’s kind of confirms that suspicion. But I have no real data to backup the claim.

But I will definitely be using whatever the cheapest method is and I’m baking in a method for all of my projects to easily swap out the backend.

yashasviisawal · June 10, 2023, 9:24pm

You can use merlin browser extension. It makes reading long articles pretty easier since it automatically summarises every article I open up!

kekren · July 15, 2023, 1:17pm

Thanks for suggesting this method. But can’t we increase the context length of final summary so that we don’t loose much detail since langchain seems to retain the document context as against this method. Don’t you think so?

gabrielmoraesp · October 4, 2023, 3:52pm

Hey guys.
Any news about the challenge?
I am developing a project with the aim of summarizing legal documents.
As a test, I’m trying to summarize a legal case in pdf, which has 291,358 tokens. Approximately 650 pages.

Does anyone know if it’s possible?

SomebodySysop · October 4, 2023, 4:37pm

What challenge?

See the links here for options: How to send long articles for summarization? - #3 by SomebodySysop

Topic		Replies	Views
Problems with long contexts - gpt that solves law cases API gpt-4o	17	777	March 28, 2025
Is there any way by which I can let GPT-4 API summarize large PDF texts? API gpt-4 , api	10	11805	May 6, 2024
Multi document comparision and Q/A API gpt-4 , chatgpt , langchain , token , comparison	10	14627	June 5, 2024
Can't get a model to follow a specific length / word count Prompting chatgpt	25	1529	December 19, 2024
PDF summarizer using openai API	22	17477	January 2, 2024

How to send long articles for summarization?

Related topics