How can I write Python code such that both input prompt and output results are in the same conversation thread?

Hello everyone,
Currently, I want to perform summary analysis on some long texts I have, which are about 10,000 to 20,000 tokens in length.

Previously, I tried segmenting the long text and feeding it to ChatGPT in batches, but this often led to memory confusion and hallucination contamination.

My current idea is to first segment the long text into a list, then use a for loop to send these segments to AI to generate summaries, and finally, integrate these into a final version of the summary for analysis.

However, the only programming I know involves sending a single question and receiving a single answer, which might make the for loop generate independent summaries without continuity.
It seems like these are separate conversations, not part of the same context, which might not trigger the AI’s memory function.

Does anyone know how to program it so that the prompts I input and the results I receive are part of the same thread, similar to the way it works in the ChatGPT interface where there is contextual continuity?

Thank you, everyone.

A conversation history of prior inputs is typically used as a chatbot.

The AI sees what you wrote, the AI observes its output. That informs the context of the latest question.

(you also pay every time for the AI being provided those extra tokens of memory)

You describe a task that simply has many independent steps. Metacode…

  1. for part in all_parts:
    – summaries.append(Input to AI: “summarize this part of a longer document: {part}”)

  2. Input to AI: summarize all these summaries together {summaries}

The AI may benefit from some more context of the part it is currently summarizing, but the cost escalates:

  1. for index, part in enumerate(all_parts):
    – summaries.append(Input to AI: “Overview so far {total_summary}. Summarize this part of a longer document: {part}”)
    – total_summary = Input to AI: " Here’s a summary {total_summary}. Include this additional info in a new total summary {summaries[index]}"

  2. Input to AI: summarize all these summaries together {summaries}

Retaining step two is going to give a better quality than the total_summary made of continued iterations, where the first chunk has been overwritten many times.

None of these really need a multi-message conversational context. Just programming flow for the task.

Thank you for your quick response to my query, but I must admit, I’m quite new to programming and didn’t fully grasp your explanation.

Are you suggesting that if I split a long text into subsets, say 1 to 5, I should first summarize subset 1 using AI to create a sub-summary, and then use that summary as a context for the next subset to generate another summary, and so on?

However, I’m concerned that combining the summary of the previous subset with the next one, plus the input prompt, might exceed the token limit close to 4096, thereby compromising the output.

Also, this method might disproportionately emphasize the importance of the first subset. If the beginning is just an introduction or irrelevant discussion, this might not be the best approach.

Thank you.

gpt-3.5-turbo-1106 and gpt-3.5-turbo-0125 API models have a 16k context, so this can improve the convenience of passing larger inputs at once to the AI. They unfortunately lack as complete a comprehension of the input as that which you place into gpt-3.5-turbo-0613 with 4k, where you pay more for the input per token also.

gpt-4-turbo can turn 124k of input into 1k of output all at once - for $1.27 a call at such a maximum, compared with 15k → 1k for gpt-3.5-turbo = $0.01.

So you can examine how often you would perform this task vs your coding time and see if dozens of calls has savings value beyond the easy to perform gpt-4 usage.


Your concern is that “summarize this” on a single chunk alone doesn’t seem to produce high quality. You theorize that if the AI knew a bit more about the context of the chunk, that summary could be better.

We take it for granted that the final product will be a summary of all the summaries.

So then the only question is how to provide that extra information about the document that frames the current chunk. Some options:

  • include the chunk summary that immediately proceeded: cheaper, simple
  • include the first summary, supposing it contains an overview or abstract: cheaper, simple
  • include a summary of all the chunk summaries up to now: an extra call on large input
  • revise a summary of all the chunks up to now with the last summary: an extra call on a smaller input

The final product being a summary of all summaries.

Then optionally do the work again in a different way with the highest quality extra information:

  • include the total document summary from the first pass when summarizing the chunks again for highest quality chunk summary.

We make what seems to be a lot of API calls for a single product because they can be cheap on gpt-3.5, and we expect that the less we give the AI at once, the more it will be focused on its work.

Then you can look at the chunking technique: whether it splits at logical document sections, whether it is better with some overlapping info…

1 Like

Thank you for the clarification. It seems I’ll need to invest more in using the GPT-4 model, since GPT-3.5 only supports up to 16K tokens for context.

However, I’m still contemplating how to write the code. It appears there is another method using the assistant mode that can handle specific tasks.