How to limit the number of messages or tokens that are persisted in a thread to maintain context in Open AI Assistants?

arslanjs.dev · November 28, 2023, 3:41pm

I read the documentation and it says there is no limit, the only limits is the model’s context. But I want to control how many messages or tokens are persisted in the thread to maintain context, so instead of the thread processing all the chat history for context just use the previous 2 or 3 messages only. This is to reduce token usage on assistants api. Is there any way to do it ?

_j · November 28, 2023, 3:50pm

Controlling the chat length and amount of data loaded into the AI model is not possible.

It is also not possible to create a new thread that has just some of the user/assistant exchanges.

The assistant system is apparently designed for maximum expense, not sanity.

derrickob · November 28, 2023, 4:11pm

There’s hope somewhere in these lines. But am wondering what the user is doing with the Assistant whose sole purpose is to bring the anger out of devs. Perhaps I’d be better off waiting for the stable version. Hoping the previous GPT-3.5-turbo-0613 doesn’t go into the bin before we can have a stable 1106

_j · November 28, 2023, 4:23pm

“Uses what they learned in ChatGPT”? Perhaps what they learned is that giving the AI the minimum amount of passable conversation to reduce OpenAI’s costs in ChatGPT makes for frustrated users who go here and other forums with “it’s even dumber now, can’t remember what I just said”, so OpenAI does the opposite when chat history is billable…

Once the size of the Messages exceeds the context window of the model, the Thread will attempt to include as many messages as possible that fit in the context window and drop the oldest messages.

derrickob · November 28, 2023, 4:33pm

Someone at first glance will see it as an opportunity for the heavy lifting. Get hands-on to implement and see themselves opting to rather pass contexts they stored in DB instead.

arslanjs.dev · November 28, 2023, 4:43pm

There is no hope in it.

arslanjs.dev · November 28, 2023, 4:44pm

The only thing I can think of is just create a new thread at every new request and maybe append some messages from the previous thread in order to add context.

_j · November 28, 2023, 4:53pm

That would be a very logical assumption of what you might be able to do, but OpenAI has blocked placing messages from the AI back into thread chat history as “assistant” to appear as if they came from the AI.

You can try summarizing by AI, similar to what you might type yourself in a new ChatGPT session when you say “here’s what me and another AI were talking about (but I had to abandon the chat because the AI got hung up on the wrong answer…)” That won’t have the believability of the AI transparently continuing where it left off or being able to understand a user instruction “change the previous code you wrote”, but is something.

Enough code workarounds and convoluted code, tracking runs, threads, IDs, steps, checking assistant states, and you find simplicity in just using chat completions and having your answer immediately streamed.

derrickob · November 28, 2023, 5:01pm

In my recent back-fired Assistant implementation I allowed users to do their query and get their response and before my backend code terminates execution, it returns a list of all the messages in the user’s thread ordered by creation time in ascending order and deletes all except the last 10 messages: That’s assistant vs user. Does the trick for me instead of waiting for OpenAi to drink 128K context on my own API usage cost.

Also, if you’re doing the “every time create a new thread after a certain time”, they have a cost for each of the threads created, no? Seems a weird solution when they allow you to delete messages in a thread by providing the message ID.

Also I pointed earlier, Assistant is just good at back-firing, because I think it’s what it’s currently good at.

_j · November 28, 2023, 5:05pm

There is no “delete messages in a thread” API method.

https://platform.openai.com/docs/api-reference/messages

You get delete assistant, delete assistant file, delete thread.

The only modification you can do is to add metadata for your own use.

arslanjs.dev · November 28, 2023, 5:08pm

So all of these things just render this useless right? back to langchain and flowise with barebones open AI api, to train on custom data and maintain context.

_j · November 28, 2023, 5:11pm

Useful at $1 a question, without price limit for a chat.
For the parts that work as advertised.

derrickob · November 28, 2023, 5:36pm

There’s no documentation on the delete message yet but the OpenAI PHP Community maintained lib has the feature. I’ll try digging through the lib source to find the endpoint it’s being made to. It’s accepting the threadid and the messageid parameters to be deleted. But no documentation provided in the DOC. No, it doesn’t delete the thread, only the messageid provided

arslanjs.dev · November 28, 2023, 5:39pm

Can you provide the links to repo or something like that.

derrickob · November 28, 2023, 5:44pm

Check in Thread Resource

_j · November 28, 2023, 5:45pm

The python API library is auto-built from the API spec.

Even going to “next” branch, five commits ahead of main, no such method is discovered.

github.com

openai/openai-python/blob/next/src/openai/resources/beta/threads/messages/messages.py

# File generated from our OpenAPI spec by Stainless.

from __future__ import annotations

from typing import TYPE_CHECKING, List, Optional
from typing_extensions import Literal

import httpx

from .files import Files, AsyncFiles, FilesWithRawResponse, AsyncFilesWithRawResponse
from ....._types import NOT_GIVEN, Body, Query, Headers, NotGiven
from ....._utils import maybe_transform
from ....._resource import SyncAPIResource, AsyncAPIResource
from ....._response import to_raw_response_wrapper, async_to_raw_response_wrapper
from .....pagination import SyncCursorPage, AsyncCursorPage
from ....._base_client import AsyncPaginator, make_request_options
from .....types.beta.threads import (
    ThreadMessage,
    message_list_params,
    message_create_params,

This file has been truncated. show original

One could certainly throw some raw API calls at the messages or thread via curl and see what doesn’t get you an error 400 invalid. At least errors are free for now.

derrickob · November 28, 2023, 5:48pm

This works for the PHP lib. Not sure why not covered in the Python Lib

_j · November 28, 2023, 6:14pm

I could hack it into the python library, which now filters out and then doesn’t pass non-schema calls or parameters, and giving a go, but that would take effort with no personal reward…and altering multiple files in repetitive places because of the auto-generated overloaded bloat.

ihubanov · December 4, 2023, 12:18pm

github.com/openai/openai-python

Delete thread message... It seems the API supports delete message but python library doesn't have it implemented

opened 12:17PM - 04 Dec 23 UTC

ihubanov

### Confirm this is a feature request for the Python library and not the underly…ing OpenAI API. - [X] This is a feature request for the Python library ### Describe the feature or improvement you're requesting The need for this feature came because with the newer model where input is 120k tokens, when the conversation gets longer and longer, one message might cost up to $1.2+ So users might want to cleanup the thread conversation and leave only desired number of messages in history... I've already implemented this feature in my fork below, and would be very happy if we also have it in the official repo as well. Link to my fork commit: [https://github.com/openai/openai-python/commit/6c408960577d22604f1086edc913cc6500752521](url) ### Additional context Also some related discussion here: [https://community.openai.com/t/how-to-limit-the-number-of-messages-or-tokens-that-are-persisted-in-a-thread-to-maintain-context-in-open-ai-assistants/531814](url)

logankilpatrick · December 4, 2023, 5:39pm

Thanks for flagging, going to document this now and kick off the process of having this added to the SDK! Stay tuned.

Topic		Replies	Views
Reducing Context Tokens in Assistant Threads API assistants	21	9841	July 8, 2024
Assistant API Max input context size API	5	2111	April 16, 2024
How exactly do you get charged for using the API for assistants? API assistants-api	33	7662	November 27, 2023
OpenAI Assistant maximum token per Thread API gpt-4-turbo	11	11718	May 28, 2024
Assistants API context window? API gpt-4-turbo , assistants-api	2	3177	November 26, 2023

How to limit the number of messages or tokens that are persisted in a thread to maintain context in Open AI Assistants?

Related topics