Assistant Thread limitations

evgeny1 · June 18, 2024, 1:34pm

Hi,

I"m trying to understand how the assitants-api works, i was under the impression that by creating an assistant and a thread i could manage a session.

My use case is that i need to process a large amount of text and convert it to the format i need, now with a thread i thought that the first few messages would keep my instructions and i could write some could to send requests to that thread with the data i need to process and the AI would process that.

However i soon discovered this issue:

 Request too large for gpt-4o in organization org-abc on tokens per min (TPM): Limit 30000, Requested 30502. The input or output tokens must be reduced in order to run successfully. Visit https://platform.openai.com/account/rate-limits to learn more.

Per my research i think each time i send a new query all the history in the thread is being added to that query otherwise i can’t explain how a query of 300 tokens + a response surpasses 1000 tokens.

Please suggest if this assumption is correct.

I found that maybe i could use truncation_strategy to control this issue but what i need is to keep the original X messages lets say 20 and keep deleting the new ones i send and replace them from the file i"m reading, otherwise it would be simpler to use the chat api and send the instructions + the query every time i interact with the AI.

Thiago · June 18, 2024, 1:45pm

check the tokens first:
https://platform.openai.com/tokenizer

review your code with print/log statements to see what and how much text you are sending to it.

if you are sure it is not a coding error, I suggest to report it as a bug:

https://community.openai.com/t/how-to-properly-report-a-bug-to-openai

evgeny1 · June 25, 2024, 8:55am

I think my mistake was to assume that there is a session being created while in actuality it just stores all the previous messages and appends them to the assistant on every query i make meaning that all the previous messages are included and that is why i exceed the amount of tokens per conversation.

Not sure how it works in the Web version but based of my tests i think it just truncates older messages silently and the user keep thinking the model remembers all the conversation but in face on the last messages are remembered.

evgeny1 · June 25, 2024, 8:57am

I"m just going to use the Chat API with my custom prompt instructions and then process each query separately with that.

Going to cost a bit more but at least it will work.

ntg69 · June 26, 2024, 5:18pm

When you increase your Tier you will increase your token limits. Right now you are hitting the limit of the Tier 1.

Another solution is to create a new thread each time. When you have a long running thread the token count will increase due to the context of the thread.

https://platform.openai.com/docs/guides/rate-limits/usage-tiers

agustin.filippo · July 30, 2024, 5:39pm

Perhaps it is a knolwdge issue, at least in my case. I’m hitting the limitations mentioned. A truncation or summarization in the prompt is needed, regardless of the tier upgrade. Now, you hit the limit, the run is stopped, but you’re charged anyways.

Topic		Replies	Views
Assistants API - Thread Tokens vs Thread Management API	3	172	January 9, 2025
Questions about Assistant, threads API gpt-4 , assistants , assistants-api , assistants-pricing	29	37371	July 18, 2024
Assistant API token Usage - promt_tokens usage is too high API api-usage , assistants , assistants-api	8	1967	April 10, 2024
Limitations of assistants and threads API assistants-api	3	660	December 3, 2024
GPT-4o Assistant Thread Length Limit? API playground , limitations , threads , assistant , gpt-4o	9	11185	July 19, 2024

Assistant Thread limitations

Related topics