How to limit the number of messages or tokens that are persisted in a thread to maintain context in Open AI Assistants?

I read the documentation and it says there is no limit, the only limits is the model’s context. But I want to control how many messages or tokens are persisted in the thread to maintain context, so instead of the thread processing all the chat history for context just use the previous 2 or 3 messages only. This is to reduce token usage on assistants api. Is there any way to do it ?


Controlling the chat length and amount of data loaded into the AI model is not possible.

It is also not possible to create a new thread that has just some of the user/assistant exchanges.

The assistant system is apparently designed for maximum expense, not sanity.


There’s hope somewhere in these lines. But am wondering what the user is doing with the Assistant whose sole purpose is to bring the anger out of devs. Perhaps I’d be better off waiting for the stable version. Hoping the previous GPT-3.5-turbo-0613 doesn’t go into the bin before we can have a stable 1106

“Uses what they learned in ChatGPT”? Perhaps what they learned is that giving the AI the minimum amount of passable conversation to reduce OpenAI’s costs in ChatGPT makes for frustrated users who go here and other forums with “it’s even dumber now, can’t remember what I just said”, so OpenAI does the opposite when chat history is billable…

Once the size of the Messages exceeds the context window of the model, the Thread will attempt to include as many messages as possible that fit in the context window and drop the oldest messages.

Someone at first glance will see it as an opportunity for the heavy lifting. Get hands-on to implement and see themselves opting to rather pass contexts they stored in DB instead.

There is no hope in it.

The only thing I can think of is just create a new thread at every new request and maybe append some messages from the previous thread in order to add context.

That would be a very logical assumption of what you might be able to do, but OpenAI has blocked placing messages from the AI back into thread chat history as “assistant” to appear as if they came from the AI.

You can try summarizing by AI, similar to what you might type yourself in a new ChatGPT session when you say “here’s what me and another AI were talking about (but I had to abandon the chat because the AI got hung up on the wrong answer…)” That won’t have the believability of the AI transparently continuing where it left off or being able to understand a user instruction “change the previous code you wrote”, but is something.

Enough code workarounds and convoluted code, tracking runs, threads, IDs, steps, checking assistant states, and you find simplicity in just using chat completions and having your answer immediately streamed.

1 Like

In my recent back-fired Assistant implementation I allowed users to do their query and get their response and before my backend code terminates execution, it returns a list of all the messages in the user’s thread ordered by creation time in ascending order and deletes all except the last 10 messages: That’s assistant vs user. Does the trick for me instead of waiting for OpenAi to drink 128K context on my own API usage cost.

Also, if you’re doing the “every time create a new thread after a certain time”, they have a cost for each of the threads created, no? Seems a weird solution when they allow you to delete messages in a thread by providing the message ID.

Also I pointed earlier, Assistant is just good at back-firing, because I think it’s what it’s currently good at.

There is no “delete messages in a thread” API method.

You get delete assistant, delete assistant file, delete thread.

The only modification you can do is to add metadata for your own use.

So all of these things just render this useless right? back to langchain and flowise with barebones open AI api, to train on custom data and maintain context.

Useful at $1 a question, without price limit for a chat. :nerd_face:
For the parts that work as advertised.

There’s no documentation on the delete message yet but the OpenAI PHP Community maintained lib has the feature. I’ll try digging through the lib source to find the endpoint it’s being made to. It’s accepting the threadid and the messageid parameters to be deleted. But no documentation provided in the DOC. No, it doesn’t delete the thread, only the messageid provided

Can you provide the links to repo or something like that.

Check in Thread Resource

The python API library is auto-built from the API spec.

Even going to “next” branch, five commits ahead of main, no such method is discovered.

One could certainly throw some raw API calls at the messages or thread via curl and see what doesn’t get you an error 400 invalid. At least errors are free for now.

This works for the PHP lib. Not sure why not covered in the Python Lib

I could hack it into the python library, which now filters out and then doesn’t pass non-schema calls or parameters, and giving a go, but that would take effort with no personal reward…and altering multiple files in repetitive places because of the auto-generated overloaded bloat.

Thanks for flagging, going to document this now and kick off the process of having this added to the SDK! Stay tuned.