Request: make it possible to specify the upper limit of history

higutea · March 3, 2024, 1:47am

I’m using the Assistant API. I am using the messages.list function, but it was designed to return all past historical data. It is said that up to 100 histories will be retained, but this would result in a huge number of tokens and the cost would be too high. Also, the result of the list function does not include the number of input/output tokens, making it difficult to understand the assistant API as a business.
Could you please make it possible to specify the upper limit of history and return the number of read/write tokens? Is there anyone having the same problem?

get message

def _get_message(gpt_thread_id:str, secret:str):
client = OpenAI(api_key=secret)
return client.beta.threads.messages.list(
thread_id=gpt_thread_id,
)

higutea · March 3, 2024, 2:34am

Maybe it can be controlled with the limit parameter?
def list(
self,
thread_id: str,
*,
after: str | NotGiven = NOT_GIVEN,
before: str | NotGiven = NOT_GIVEN,
limit: int | NotGiven = NOT_GIVEN,
order: Literal[“asc”, “desc”] | NotGiven = NOT_GIVEN,
# Use the following arguments if you need to pass additional parameters to the API that aren’t available via kwargs.
# The extra values given here take precedence over values defined on the client or passed to this method.
extra_headers: Headers | None = None,
extra_query: Query | None = None,
extra_body: Body | None = None,
timeout: float | httpx.Timeout | None | NotGiven = NOT_GIVEN,
) → SyncCursorPage[ThreadMessage]:

limit: A limit on the number of objects to be returned. Limit can range between 1 and 100, and the default is 20.

higutea · March 3, 2024, 2:42am

Even if limit=1 is specified, the number of response history continues to increase. .

def _get_message(gpt_thread_id:str, secret:str):
client = OpenAI(api_key=secret)
return client.beta.threads.messages.list(
thread_id=gpt_thread_id,
limit=1
)

_j · March 3, 2024, 3:20am

You give up control of how much history is placed from threads into the AI model context window length. There are no controls available, and the promise is that more of the model will be loaded with any available tokens rather than less, including knowledge from files whether relevant or not.

If you wish to control expense, and even control the AI understanding without distraction, it is better to build with chat completions.

higutea · March 3, 2024, 3:48am

Thank you for your reply.
However, since there is a limit parameter in the assistant api IF, I think there is an intention to control it on the Openai side as well. Even if I specify limit=1, the behavior does not change.

Cost is an important factor in business, so it cannot be ignored, and the assistant API is an attractive API.
I understand.

_j · March 3, 2024, 3:51am

The limit parameter is just how much you want to retrieve back at once from listing the contents of a thread or listing the assistants. It doesn’t control the operation of runs. Hope that helps.

higutea · March 3, 2024, 3:58am

Even if I specify limit=1, the past history is returned, but is this a specification of the API?

_j · March 3, 2024, 4:04am

https://platform.openai.com/docs/api-reference/messages/listMessages

Returns a list of messages for a given thread. Returns the messages TO YOU to look at.

limit – integer, Optional
Defaults to 20

A limit on the number of objects to be returned. Limit can range between 1 and 100, and the default is 20.

Doesn’t alter the content or change any setting about how much chat history from a thread will be placed into the AI when it runs.

higutea · March 3, 2024, 4:21am

Is it so.
Well, every time you have a conversation, historical data will be returned, and the number of tokens will double.

_j · March 3, 2024, 4:37am

Yes, it is so. I make note of the extreme expenditures, and others also getting hit with big bills, one run potentially using many dollars of tokens, the week this came out:

higutea · March 3, 2024, 2:17pm

I hope the openai engineers do something about it. The limit parameter is also a specification that I don’t understand.

halosystems12 · July 10, 2024, 3:45pm

I am going to try the following to control the amount of messages (and thus limit the cost). You can start a new thread when conversation length becomes too long. (This would obviously require a bit of custom logic to make sure information doesn’t go missing by taking messages from the previous thread and adding it to the new thread. ) For example, if the thread has more than 10 messages, retrieve those messages and add it to the new thread. I’m not sure if I misunderstood the topic, but here is is anyway.

Topic		Replies	Views
Can we manually control the thread lenght? API gpt-4 , api , assistants-api	2	584	December 13, 2023
How to limit input tokens of assistant? API	5	4599	March 1, 2024
Assistant Thread limitations API gpt-4 , api , assistants-api	5	1302	July 30, 2024
How exactly do you get charged for using the API for assistants? API assistants-api	33	8060	November 27, 2023
How to limit the number of messages or tokens that are persisted in a thread to maintain context in Open AI Assistants? API api , assistants-api , assistants-pricing	24	10275	January 29, 2024

Request: make it possible to specify the upper limit of history

get message

Related topics