Hello, I’m using the API and I’m being hit with the error ‘Request too large for gpt-4o’
My token limit is 30K, but I’m getting this error in threads with 4-5 messages and I don’t understand where those 30K tokens could be coming from.
Even when counting the prompt and all the documents I’ve given it as tokens, they amount to 8,777.
So where is the 30K coming from? Is it just typing a really really long answer?
In the prompt I tell it multiple times to keep its answers brief, and based on other times its answered what I’m asking, the response should be around 100 tokens.
Every document search is another API model call done internally, then the backend calls the model again after adding the document results to the thread with all your past chats, that thread with assistant calls to tools and assistant responses to you, the tool definitions, the instructions and additional_instructions, the past document retrievals - lots of tokens…and maybe the AI still wants to call another search.
This is what burns up your 30k token-per-minute rate limit at tier-1, with you paying for everything up to not getting the final response.
gpt-4o-mini is given a higher limit for such testing. You can reduce the chunk size of vector store documents, and reduce the number of chunks returned, and also use a similarity threshold on the file search.
There will simply be threads that you cannot interact with any more, though, especially without the controls only available through API calls. You have to have paid up $50+ ahead of time after a waiting period for tier elevation to get useful rate limits. There’s a link you can click.