How can I write Python code such that both input prompt and output results are in the same conversation thread?

jason123 · April 25, 2024, 11:03pm

Hello everyone,
Currently, I want to perform summary analysis on some long texts I have, which are about 10,000 to 20,000 tokens in length.

Previously, I tried segmenting the long text and feeding it to ChatGPT in batches, but this often led to memory confusion and hallucination contamination.

My current idea is to first segment the long text into a list, then use a for loop to send these segments to AI to generate summaries, and finally, integrate these into a final version of the summary for analysis.

However, the only programming I know involves sending a single question and receiving a single answer, which might make the for loop generate independent summaries without continuity.
It seems like these are separate conversations, not part of the same context, which might not trigger the AI’s memory function.

Does anyone know how to program it so that the prompts I input and the results I receive are part of the same thread, similar to the way it works in the ChatGPT interface where there is contextual continuity?

Thank you, everyone.

_j · April 25, 2024, 11:34pm

A conversation history of prior inputs is typically used as a chatbot.

The AI sees what you wrote, the AI observes its output. That informs the context of the latest question.

(you also pay every time for the AI being provided those extra tokens of memory)

You describe a task that simply has many independent steps. Metacode…

for part in all_parts:
– summaries.append(Input to AI: “summarize this part of a longer document: {part}”)

Input to AI: summarize all these summaries together {summaries}

The AI may benefit from some more context of the part it is currently summarizing, but the cost escalates:

for index, part in enumerate(all_parts):
– summaries.append(Input to AI: “Overview so far {total_summary}. Summarize this part of a longer document: {part}”)
– total_summary = Input to AI: " Here’s a summary {total_summary}. Include this additional info in a new total summary {summaries[index]}"

Input to AI: summarize all these summaries together {summaries}

Retaining step two is going to give a better quality than the total_summary made of continued iterations, where the first chunk has been overwritten many times.

None of these really need a multi-message conversational context. Just programming flow for the task.

jason123 · April 26, 2024, 12:25am

Thank you for your quick response to my query, but I must admit, I’m quite new to programming and didn’t fully grasp your explanation.

Are you suggesting that if I split a long text into subsets, say 1 to 5, I should first summarize subset 1 using AI to create a sub-summary, and then use that summary as a context for the next subset to generate another summary, and so on?

However, I’m concerned that combining the summary of the previous subset with the next one, plus the input prompt, might exceed the token limit close to 4096, thereby compromising the output.

Also, this method might disproportionately emphasize the importance of the first subset. If the beginning is just an introduction or irrelevant discussion, this might not be the best approach.

Thank you.

_j · April 26, 2024, 1:02am

gpt-3.5-turbo-1106 and gpt-3.5-turbo-0125 API models have a 16k context, so this can improve the convenience of passing larger inputs at once to the AI. They unfortunately lack as complete a comprehension of the input as that which you place into gpt-3.5-turbo-0613 with 4k, where you pay more for the input per token also.

gpt-4-turbo can turn 124k of input into 1k of output all at once - for $1.27 a call at such a maximum, compared with 15k → 1k for gpt-3.5-turbo = $0.01.

So you can examine how often you would perform this task vs your coding time and see if dozens of calls has savings value beyond the easy to perform gpt-4 usage.

Your concern is that “summarize this” on a single chunk alone doesn’t seem to produce high quality. You theorize that if the AI knew a bit more about the context of the chunk, that summary could be better.

We take it for granted that the final product will be a summary of all the summaries.

So then the only question is how to provide that extra information about the document that frames the current chunk. Some options:

include the chunk summary that immediately proceeded: cheaper, simple
include the first summary, supposing it contains an overview or abstract: cheaper, simple
include a summary of all the chunk summaries up to now: an extra call on large input
revise a summary of all the chunks up to now with the last summary: an extra call on a smaller input

The final product being a summary of all summaries.

Then optionally do the work again in a different way with the highest quality extra information:

include the total document summary from the first pass when summarizing the chunks again for highest quality chunk summary.

We make what seems to be a lot of API calls for a single product because they can be cheap on gpt-3.5, and we expect that the less we give the AI at once, the more it will be focused on its work.

Then you can look at the chunking technique: whether it splits at logical document sections, whether it is better with some overlapping info…

jason123 · April 26, 2024, 2:41am

_j:

Your concern is that “summarize this” on a single chunk alone doesn’t seem to produce high quality. You theorize that if the AI knew a bit more about the context of the chunk, that summary could be better.

We take it for granted that the final product will be a summary of all the summaries.

So then the only question is how to provide that extra information about the document that frames the current chunk. Some options:

include the chunk summary that immediately proceeded: cheaper, simple

include the first summary, supposing it contains an overview or abstract: cheaper, simple

include a summary of all the chunk summaries up to now: an extra call on large input

revise a summary of all the chunks up to now with the last summary: an extra call on a smaller input

The final product being a summary of all summaries.

Then optionally do the work again in a different way with the highest quality extra information:

include the total document summary from the first pass when summarizing the chunks again for highest quality chunk summary.

We make what seems to be a lot of API calls for a single product because they can be cheap on gpt-3.5, and we expect that the less we give the AI at once, the more it will be focused on its work.

Then you can look at the chunking technique: whether it splits at logical document sections, whether it is better with some overlapping info…

Thank you for the clarification. It seems I’ll need to invest more in using the GPT-4 model, since GPT-3.5 only supports up to 16K tokens for context.

However, I’m still contemplating how to write the code. It appears there is another method using the assistant mode that can handle specific tasks.

Topic		Replies	Views
How should a program be written to summarize a long text using an API, and what are the considerations regarding the maximum number of tokens allowed? API	2	2202	April 19, 2024
What's the best API method/model for this use-case scenario? API	3	1722	December 1, 2023
Information summary by using API API	3	7131	January 9, 2024
Trying to use gpt-4 for correcting, but it summarises API gpt-4 , whisper	9	2425	December 30, 2023
Chained Prompt to complete text larger than 4000 tokens? API	14	6120	December 25, 2023

How can I write Python code such that both input prompt and output results are in the same conversation thread?

Related topics