Here’s an overview. Reminder: Assistants is a cute add-on, but you’d really want to develop using Chat Completions and with self-management of user conversations. Assistants is going away in under a year.
Threads will grow and grow as you and your user continue to chat with them. That is the expected operation, maintaining a chat history of a session with a user. If the user were to press “new chat” in your application, you’d start a new thread for them.
Assistants at least has a working cost management mechanism, unlike the Responses endpoint with its reuse of a previous response that will grow without limit. Here’s what you’d do:
When you invoke a “run” to process the newest message you’ve added, you use this run parameter:
With truncation_strategy in effect, and a limited number of last_messages as an integer value you send, the oldest chat turns beyond that message count will be automatically dropped from the input to the AI model. It is not an exact cost, like we might want a particular number of tokens and would develop our own management on Chat Completions, but it’s something instead of nothing. (it does break a cache discount when it kicks in, which is why you’d program chat history management yourself intelligently).
The threads themselves don’t cost you anything to store and to fill up OpenAI’s storage. They actually aren’t expiring even after 30 days, or even a year, so you can resume a conversation far after. It’s up to you to delete the thread ID with the API delete method if you don’t want data persisted forever, either when the user deletes a chat, or when you want to expire them.