Rate limit reached for 10KTPM-200RPM

davidkoba40 · September 19, 2023, 2:30pm

Hey, im using Chatrace to build an AI chatbot and ive noticed that it sometimes skips questons and wont answer them and then i saw this error:

“Rate limit reached for 10KTPM-200RPM in organization on tokens per min. Limit: 10000 / min. Please try again in 6ms. Contact us through our help center at help.openai.com if you continue to have issues.”

Can anybody suggest me what should i do?
Is the prompt too long? its currently on about 16K characters

_j · September 19, 2023, 2:36pm

The reason why you are encountering that error is because gpt-4 has rather low limits for tokens-per-minute.

Tokens are the internal AI encoding that represents words and parts of words as pieces.

The rate limit doesn’t actually count the tokens though: it estimates based on characters you input. However, it does consider the value you specify in max_tokens as counting against the rate limit in tokens.

If you specify a large max_tokens, you may be blocking yourself even though you are only getting a small response with that call. You can reduce the the value of that parameter, or more effective, remove the parameter entirely so it doesn’t count against you even before you used the AI.

The performance of the AI solution you’ve written will have to be improved by your instructions and using correct messaging to the model.

davidkoba40 · September 19, 2023, 2:57pm

thanks, will making the prompt shorter can help in this situation or not?

_j · September 19, 2023, 3:30pm

The prompt is estimated by the characters within, so it can help.

Rather, I would just use a software solution that holds back sending another gpt-4 request until the next minute if you have sent over 15000 characters, or whatever value you find eliminates the error.

An advanced use would be to use the rate limit remaining value that is returned in the http headers, but it wouldn’t inform you how long to wait before sending another request unless your use pattern is bursts of requests by the minute.

davidkoba40 · September 19, 2023, 4:51pm

thank you!
DO u have an estimate of how manyh charactars the prompt should be to not encounter such error?
Currently im standing at about 17k charactars and get this often.

Foxalabs · September 19, 2023, 4:57pm

What I do is build a simple model of the token system, it’s a variable it is initially set to the token per min rate limit and then every second I add on the token per minute limit limit divided by 60 and I take off any tokens sent that second, if the value in the variable is > the max token per min limit I cap the value at that limit, if the value is close to 0 I wait until the value is > that number of tokens I need to send that second. It’s a basic rate limiter that matches what OpenAI are doing on their side to ensure you never bump into the limits

anon34024923 · October 2, 2023, 1:37pm

…i will pay you to put this in my app. If you’re down I have a simple express backend. I have a chatbot and we’re hitting the rate limit more and more. Looking for help with this if you’re down @Foxalabs Foxabilo

Foxalabs · October 2, 2023, 3:35pm

I’m happy to, I’ve sent you a forum message.

_j · October 2, 2023, 4:42pm

You can do an exact rate limiter: fifo message token elements count against the rate until they expire after a minute. Like the “25 GPT-4 messages every three hours”, you have to wait for your first of 25 to drop out before you can send the 26th. Store token metadata, along with the header quota remaining. With the headers, you can better align and adapt the rollover second out of that minute. But also must understand that prompt inputs that could be blocked are not exactly measured, while max_tokens is.

The oddity is how the rate limit blocks requests, and when they count against you. You can dash off 100 “write Shakespeare, 1000 words” short prompts in a second without a max_token value to count against you. Then in two minutes you won’t be able to send anything.

harjot.gill · October 4, 2023, 1:47pm

We are using a client-side rate limiter to limit and prioritize gpt-4 requests but we are still struggling to tune it properly. How are tokens being estimated from the character count?

This is what we have in the code -

				estimated_tokens: Math.max(
					tokens + TOKEN_MARGIN,
					ALL_MODELS[modelVariant].maxModelTokens -
						ALL_MODELS[modelVariant].requestTokens,
					// sometimes OpenAI does character count / 4
					// see - https://help.openai.com/en/articles/4936856-what-are-tokens-and-how-to-count-them
					(message.length + this.systemMessage.length) / 4,
				).toString(),

Note that in the above snippet, the tokens variable holds the token count estimated via tiktoken.

Our scheduler is working only because we are making the requests go back to the scheduler a few times in case we hit the rate limits.

Our token bucket parameters for gpt-4 are below –

  policy_name: gpt-4-tpm
  quota_scheduler:
    # Bucket capacity.
    # Type: float64
    # Required: True
    bucket_capacity: 10000
    # Fill amount.
    # Type: float64
    # Required: True
    fill_amount: 40000
    # Rate Limiter Parameters
    # Type: aperture.spec.v1.RateLimiterParameters
    # Required: True
    rate_limiter:
      interval: 60s
      label_key: api_key
      delay_initial_fill: true
    scheduler:
      priority_label_key: priority
      tokens_label_key: estimated_tokens

tech9 · October 24, 2023, 2:42pm

If you specify a large max_tokens , you may be blocking yourself even though you are only getting a small response with that call. You can reduce the the value of that parameter, or more effective, remove the parameter entirely so it doesn’t count against you even before you used the AI.

That was EXACTLY what fixed the issue for me. Thank you!

Topic		Replies	Views
Rate limit error Tier 2 Account Rate Limit Issues with gpt-3.5-turbo API gpt-35-turbo , api , rate-limit , api-billing , api-rate-limits	6	7526	January 2, 2024
Rate Limits for preview models? API gpt-4	11	4670	March 11, 2024
Struggling with max_tokens and getting responses within a given limit, please help! API chatgpt	5	19567	October 28, 2023
Token per minute rate limit for GPT4 issues API rate-limit	7	10900	December 22, 2023
Gpt4 token usage not using more than 3000 tokens even though it’s listed at much higher availability API	12	1936	December 17, 2023

Rate limit reached for 10KTPM-200RPM

Related topics