Assistants API - Thread Tokens vs Thread Management

darovenomcx · January 8, 2025, 1:01pm

Hello,

What is the optimal way to manage large threads in the Assistant API? I am trying to create an assistant that generates stories, but I keep encountering issues when my thread exceeds the token limit.

I understand that I need to manage this, but I’m unsure of the best approach. I know I can shorten the message context and create summaries. After that, should I delete messages from the current thread, or should I start a new thread while using the same assistant?
Are there any good and tested solutions with some description?

Currently, I’m receiving the following error message: “Request too large for gpt-4o in organization org-vl10PQxe0NxrwCqvMYDWWyeC on tokens per min (TPM): Limit 30000, Requested 30564.” This indicates that I need to reduce the input or output tokens to run successfully.

Thank you for your guidance!

icdev2dev · January 8, 2025, 5:32pm

Written some time ago… see if it helps

arata · January 8, 2025, 7:38pm

The problem is caused by Assistants completely ignoring an account’s rate limits when it constructs model context out of threads, and also when it has internal iterative processes that further consume a minute’s usage.

The only solution that OpenAI offers (purposefully) is to elevate your tier level with additional payment history, to $50+ after waiting 7+ days to make further payments.

You can indirectly improve the situation: set limits for the number of messages (not number of tokens) to be used from a thread, or reduce the chunk size of vector store files - see API documentation.

darovenomcx · January 9, 2025, 11:38am

Thanks but overall i will need very long stories.
My plan was to do shortening on story for Agent and then - probably best will be some kind of shring messages in thread. I don’t want to open new thread - cause i thing i will have a lots of threads and my stroy will be like

Story :
Thread 1
Thread 2
Thread 3
Thread n

And i was wondering if i can do that within one thread.
And i am looking for a best solution that somebody hopefully implemented

Topic		Replies	Views
Assistant Thread limitations API gpt-4 , api , assistants-api	5	1082	July 30, 2024
GPT-4o Assistant Thread Length Limit? API playground , limitations , threads , assistant , gpt-4o	9	10593	July 19, 2024
Assistant API token Usage - promt_tokens usage is too high API api-usage , assistants , assistants-api	8	1874	April 10, 2024
Token consumption: Prompt tokens exponentially increase when using Threads (Assistants) API assistants-api	8	455	September 5, 2024
Questions about Assistant, threads API gpt-4 , assistants , assistants-api , assistants-pricing	29	36313	July 18, 2024

Assistants API - Thread Tokens vs Thread Management

Related topics