Information summary by using API

I have a lot of information and I want to use GPT 4 Turbo model to summarise it.
I have searched in google and read documentation and I find way to use chat completion by giving prompt “Summirize ${information}”, I think there might be problem with tokens. Is there any other approach and way to do this summarisation?

1 Like

That sounds like a great use of the programmatic API, although the AI that can summarize the most information at once also has high costs depending on the amount of data you send.

There are two API models that have a quite large context length (the total area for input and forming output, measured in “tokens”):

gpt-4-1106-preview | 125k context | $0.01 per k of input
gpt-3.5-turbo-1106 | 16k context | $0.001 per k of input

(outputs have a separate cost, but that smaller summary won’t be the primary consideration)

Yes, that’s close to $1.25 per summary at the maximum input, but I wouldn’t encourage approaching the limits if you wish to maintain quality.

Tokens are an internal encoding that accumulate just a bit faster than actual words. You can use an online tokenizer to get an idea of the size.

A technique that can be employed is chunking - making summaries of smaller sections, and then having the AI summarize all the summaries.

If you have more ideas, there’s often people on the forum to help!

1 Like

Chunking seems to work best in my experience. Chunking pieces (with some overlap), summarize each chunk, then have the summaries summarized.
If you are planning to summarize document, it may be worth first asking the GPT to check if there is an executive summary in a file as that could be used instead.

1 Like

The method you choose for summarizing text can depend on several factors:
the length of the original text you want to summarize, the quality of the summary you’re aiming for, and the amount of cost and effort you’re willing to invest.

As mentioned in the earlier reply, if you’re looking to summarize a document of about 124K tokens down to 4K tokens, or a document of about 12K tokens down to 4K tokens, you might be successful by simply passing the text to be summarized to models like gpt-4-1106-preview or gpt-3.5-turbo-1106 along with an appropriate system message.

However, summarization can be more complex than it seems. Even if you provide the original text to a language model and instruct it to summarize with a system message, you may not always get the quality of summary you desire.

Here, we’ll discuss methods for both scenarios: when the original text fits within the input size of the language model and when it does not.

There are three well-known methods: the “Stuff” method, the “MapReduce” method, and the “Refine” method.

“Stuff” method:

  • Pass the entire original text at once and send a system message to the language model to summarize. This only works when the length of the text to be summarized fits within the model’s input size.

Pros and cons of this method are as follows:


  • Easy to execute.


  • Directly constrained by the model’s ability to summarize appropriately.
  • Limited by the model’s context length.

On the other hand, the “MapReduce” and “Refine” methods can summarize longer texts that exceed the context length the language model can accept.

The major differences between these two methods are as follows:

“MapReduce” method:

  1. Split the document.
  2. Send a system message to the language model to summarize each piece (map).
  3. Send a system message to the language model to integrate the individual summaries (reduce).

Pros and cons of this method are as follows:


  • Can smoothly process very long text data.
  • Under certain conditions, can provide a fast summary.


  • May lose the overall context due to not considering the relationships between the split texts.

“Refine” method:

  1. Summarize the text in a “step-by-step” manner.


  • Higher likelihood of maintaining the overall context compared to the “MapReduce” method.


  • The detailed summarization process can be complex, and the quality of each summarization step depends on the system message given to the language model.
  • Because it involves refining the text through multiple rounds of summarization, it can be costly.

If you’re considering a task like summarizing a single website, which involves reducing a document of about 124K tokens down to 4K tokens, the “Stuff” method, which involves passing the text to the language model and sending a system message to summarize, may yield reasonably good results.

However, if you need to summarize longer texts that don’t fit within the 128K context of gpt-4-1106-preview, or if you’re not satisfied with the summary produced by the “Stuff” method, you may need to use the “MapReduce” or “Refine” methods.

As referenced in the above reply, both the ‘MapReduce’ and ‘Refine’ methods require you to chunk the text, and you can employ strategies such as overlapping characters when chunking to avoid losing context.

The term ‘Refine,’ ‘Stuff,’ and ‘MapReduce’ may not necessarily be common parlance and could be specific to certain Python modules, but the process of using system messages (or prompts) and the steps for invoking API calls are methods that do not depend on any specific Python module, so I will introduce it.

I realize this is a lengthy and complex discussion, but I hope it serves as a useful reference for someone!