I’m creating an app using the API to teach users a foreign language, by having them chat with GPT.
I’m trying to replicate, to some extent, the ChatGPT web approach, so that the user feels like the AI remembers what they’ve been talking about.
I’m using GPT3.5 to keep costs down and also for response speed.
The limiting factor is that I want to make the app scalable which means that I’d like to keep the number of tokens used for each interaction reasonable, and not keep the full history for each request.
So the standard approach is to:
- Include the previous interactions in the current requests. I’ve done this before and am familiar with the process of using the openai Chat API.
- When reaching a certain threshold,
- get GPT to summarize the thread
- Feed GPT with the summary (as part of the prompt?), and “start a new thread”
i"m interested in hearing other folks experiences with this approach and experimenting with different number of interaction, summarization approach, and other insights.