I’m using the Assistant API. I am using the messages.list function, but it was designed to return all past historical data. It is said that up to 100 histories will be retained, but this would result in a huge number of tokens and the cost would be too high. Also, the result of the list function does not include the number of input/output tokens, making it difficult to understand the assistant API as a business.
Could you please make it possible to specify the upper limit of history and return the number of read/write tokens? Is there anyone having the same problem?
Maybe it can be controlled with the limit parameter?
def list(
self,
thread_id: str,
*,
after: str | NotGiven = NOT_GIVEN,
before: str | NotGiven = NOT_GIVEN,
limit: int | NotGiven = NOT_GIVEN,
order: Literal[“asc”, “desc”] | NotGiven = NOT_GIVEN,
# Use the following arguments if you need to pass additional parameters to the API that aren’t available via kwargs.
# The extra values given here take precedence over values defined on the client or passed to this method.
extra_headers: Headers | None = None,
extra_query: Query | None = None,
extra_body: Body | None = None,
timeout: float | httpx.Timeout | None | NotGiven = NOT_GIVEN,
) → SyncCursorPage[ThreadMessage]:
limit: A limit on the number of objects to be returned. Limit can range between 1 and 100, and the default is 20.
You give up control of how much history is placed from threads into the AI model context window length. There are no controls available, and the promise is that more of the model will be loaded with any available tokens rather than less, including knowledge from files whether relevant or not.
If you wish to control expense, and even control the AI understanding without distraction, it is better to build with chat completions.
Thank you for your reply.
However, since there is a limit parameter in the assistant api IF, I think there is an intention to control it on the Openai side as well. Even if I specify limit=1, the behavior does not change.
Cost is an important factor in business, so it cannot be ignored, and the assistant API is an attractive API.
I understand.
The limit parameter is just how much you want to retrieve back at once from listing the contents of a thread or listing the assistants. It doesn’t control the operation of runs. Hope that helps.
Yes, it is so. I make note of the extreme expenditures, and others also getting hit with big bills, one run potentially using many dollars of tokens, the week this came out:
I am going to try the following to control the amount of messages (and thus limit the cost). You can start a new thread when conversation length becomes too long. (This would obviously require a bit of custom logic to make sure information doesn’t go missing by taking messages from the previous thread and adding it to the new thread. ) For example, if the thread has more than 10 messages, retrieve those messages and add it to the new thread. I’m not sure if I misunderstood the topic, but here is is anyway.