Token per minute rate limit for GPT4 issues

Hello! I am using the GPT4 API on Google Sheets, and I constantly get this error: “You have reached your token per minute rate limit…”. I checked the documentation and it seems that I have 10,000 Tokens Per Minute limit, and a 200 Requests Per Minute Limit.
Is it my idea or is the 10,000 token per minute limitation very strict? Do you know how to increase that, or at minimum, control it in a more efficient way so it doesn’t break my entire workflow?

It would take gpt-4 far over a minute to generate 10000 output tokens, so the issue is likely how much input you are providing that counts towards the token per minute count.

Consider: if you send 6000 tokens of input (and even get a quick short answer), you can’t do that again in the same minute.

Rate increase requests can be made, but approval probably needs a company, a desired application of AI, and payment history (and gpt-4 capacity).

1 Like

Thanks, that makes sense. Do you know if the max tokens in the input are calculated towards this limit or if only the actual token input size matters? For example, if my max token input is 2k tokens, but my actually input is only 100 tokens, which of the two numbers gets accounted for in the TPM ?

1 Like

Yes, max tokens are also counted and a single input denied if it comes to over the limit. You can get a rate limit without any generation just by specifying max_tokens = 5000 and n=100 (500,000 of 180,000 for 3.5-16k).

The rate limit endpoint calculation is also just a guess based on characters; it doesn’t actually tokenize the input.

1 Like

Wow, that is very useful to know. This knowledge alone saved me a lot of money and pain. Thanks!!

You can just omit max_tokens as a parameter, and it then can’t count them upon submissions. All the remaining model context length after the input can then be used for writing output.