Request: make it possible to specify the upper limit of history

I’m using the Assistant API. I am using the messages.list function, but it was designed to return all past historical data. It is said that up to 100 histories will be retained, but this would result in a huge number of tokens and the cost would be too high. Also, the result of the list function does not include the number of input/output tokens, making it difficult to understand the assistant API as a business.
Could you please make it possible to specify the upper limit of history and return the number of read/write tokens? Is there anyone having the same problem?

get message

def _get_message(gpt_thread_id:str, secret:str):
client = OpenAI(api_key=secret)
return client.beta.threads.messages.list(
thread_id=gpt_thread_id,
)

Maybe it can be controlled with the limit parameter?
def list(
self,
thread_id: str,
*,
after: str | NotGiven = NOT_GIVEN,
before: str | NotGiven = NOT_GIVEN,
limit: int | NotGiven = NOT_GIVEN,
order: Literal[“asc”, “desc”] | NotGiven = NOT_GIVEN,
# Use the following arguments if you need to pass additional parameters to the API that aren’t available via kwargs.
# The extra values given here take precedence over values defined on the client or passed to this method.
extra_headers: Headers | None = None,
extra_query: Query | None = None,
extra_body: Body | None = None,
timeout: float | httpx.Timeout | None | NotGiven = NOT_GIVEN,
) → SyncCursorPage[ThreadMessage]:

limit: A limit on the number of objects to be returned. Limit can range between 1 and 100, and the default is 20.

Even if limit=1 is specified, the number of response history continues to increase. .

def _get_message(gpt_thread_id:str, secret:str):
client = OpenAI(api_key=secret)
return client.beta.threads.messages.list(
thread_id=gpt_thread_id,
limit=1
)

You give up control of how much history is placed from threads into the AI model context window length. There are no controls available, and the promise is that more of the model will be loaded with any available tokens rather than less, including knowledge from files whether relevant or not.

If you wish to control expense, and even control the AI understanding without distraction, it is better to build with chat completions.

Thank you for your reply.
However, since there is a limit parameter in the assistant api IF, I think there is an intention to control it on the Openai side as well. Even if I specify limit=1, the behavior does not change.

Cost is an important factor in business, so it cannot be ignored, and the assistant API is an attractive API.
I understand.

The limit parameter is just how much you want to retrieve back at once from listing the contents of a thread or listing the assistants. It doesn’t control the operation of runs. Hope that helps.

Even if I specify limit=1, the past history is returned, but is this a specification of the API?

https://platform.openai.com/docs/api-reference/messages/listMessages

Returns a list of messages for a given thread. Returns the messages TO YOU to look at.

limit – integer, Optional
Defaults to 20

A limit on the number of objects to be returned. Limit can range between 1 and 100, and the default is 20.

Doesn’t alter the content or change any setting about how much chat history from a thread will be placed into the AI when it runs.

Is it so.
Well, every time you have a conversation, historical data will be returned, and the number of tokens will double.

Yes, it is so. I make note of the extreme expenditures, and others also getting hit with big bills, one run potentially using many dollars of tokens, the week this came out:

I hope the openai engineers do something about it. The limit parameter is also a specification that I don’t understand.