Strategy for chat history, context window, and summaries

I’m creating an app using the API to teach users a foreign language, by having them chat with GPT.

I’m trying to replicate, to some extent, the ChatGPT web approach, so that the user feels like the AI remembers what they’ve been talking about.

I’m using GPT3.5 to keep costs down and also for response speed.

The limiting factor is that I want to make the app scalable which means that I’d like to keep the number of tokens used for each interaction reasonable, and not keep the full history for each request.

So the standard approach is to:

  1. Include the previous interactions in the current requests. I’ve done this before and am familiar with the process of using the openai Chat API.
  2. When reaching a certain threshold,
  • get GPT to summarize the thread
  • Feed GPT with the summary (as part of the prompt?), and “start a new thread”

i"m interested in hearing other folks experiences with this approach and experimenting with different number of interaction, summarization approach, and other insights.

1 Like

@raul_pablo_garrido_c , can you provide details on how the MS service helps?

I think you are on the right track. Summarizing older parts of the conversation, managing tokens, etc.

But there are other strategies (sorry, haven’t tried them all myself).

But having an AI categorizer that detects new topics (so trash most of the older history and “reboot”)

And embeddings, so relate current topic to past related topics through embeddings.

These are the other things you might try to keep the conversation flowing beyond summarization.


@curt.kennedy this is helpful feedback. I was aware of using embeddings, but the idea of an AI categorizer that detects new topics sounds like a really good suggestion.

1 Like