GPT-4o Assistant Thread Length Limit?

I’m not sure 4o assistants are only limited by number of tokens per minute/day. In interacting with a 4o assitant of mine yesterday I hit af hard limit on thread length 3 times. While I was under the impression that the limit was based on tokens, I noticed this morning that all 3 times this happened when the thread reached 100 messages.

I think it’s fair to say 100 is a suspiciously round number.

When hitting this limit I received the error message

" Run failed Request too large for gpt-4o in organization org-[id] on tokens per min (TPM): Limit 30000, Requested [>30000]. The input or output tokens must be reduced in order to run successfully. Visit to learn more."

That URL leads to a page not found error but suggests there should only be limits per minute and per day. Unfortunately the threads in which this occurred remain unresponsive (except for that same error message) 24 hours later.

This is a bit frustrating. Adding a dump of the old messages to a new thread blows up the number of tokens in its usage rapidly - and costs - so that’s not a solution. And curating a dump of old messages by hand to trim them down to a workable context to continue a conversation gets onerous very quickly.

I imagine what’s happening is that there’s no sliding context window in the playground and so the entire thread 's messages get used to generate each new message. But I’m reminded of the Python “Argument Clinic” … I could be arguing in my spare time :wink:


Exactly the same issue over here. Once my thread has 100 messages I’m getting an error for running an assistant on the thread.

I sent a simple message in the thread saying “Hey Andy” (see pic below) and the token count is apparently above 30000.

So suspicious that it’s as soon as the thread message count is 100… threads should have unlimited length! Seems like a gpt4o?

EDIT: It’s not just gpt4o, now getting this error for gpt4-turbo-preview too!

Account limit level is 1.

1 Like

I’m handling it by greatly expanding the System Prompt in a 3D-Kanban arrangement of comonads for XSCALE-format Epics, Story-form, Features, and BDD Scenarios, plus an explanation of YAGNI and Whole-Board Thinking. This both minimizes the Assistant’s use of tokens and constrains their behavior to minimize hallucination.

While that takes up more tokens in prompting than I’d like, it seems to give us enough room to work together happily without costing a fortune or losing context.

Yes, you huge thread conversation history and other input that would cost $0.70 just to send to gpt-4-turbo is not being limited by a limitation of messages, it is being limited by your tier 1 not able to make a single request that large due to rate liimit.

You can use API parameter truncation_strategy to reduce the number of past turns. Just file_search alone can use near 30000 tokens on a single request, though, with no control over how many chunks or how relevant the results.