I’m using the OpenAI API to build a custom chat system and have a few ideas for handling conversation history:
-
Send the entire message history to OpenAI and rely on prompt caching for optimization.
-
Truncate the middle of the conversation, keeping only the first two and last two messages.
- Update user expectations based on the latest response.
- Use a mini RAG system to manage the context of the truncated middle messages.
-
Any ideas for this …
Would these approaches be effective, or are there better ways to handle context efficiently?