Passing in of large amounts repeat tokens

is there some way to mitigate passing in large amounts of repeat tokens for long conversations. When a conversation gets long passing in like 8k tokens per message is a bit much. Is there a feature to help with this in the api or is it just how it has to be.

That’s a question with an answer: the management of what you send to the AI, the length of the conversation, is up to you – when using chat completions and keeping your own history.

With the assistants endpoint, there is a thread that keeps a chat history, and it is without controls or limitations.

So you get to choose: budget vs quality. If you go for budget, you get symptoms “ChatGPT forgot what I was just talking about”.

There are various techniques to extend the illusion of memory without completely sending everything. You can start to expire the assistant responses earlier, as user input is more important for context. You can summarize the oldest part of chat all together with another AI call every few turns, or even asks for individual summaries by cheaper AI when the AI writes at length, putting the shorter version into place after a while. Or a database that can recall old questions when the topic seems to be discussed again (semantic search). You are in control of the messages with chat completions, so you can use your imagination.

Thank you this was very helpful.