Does a Failed Request Eat up $$

matthewethan · December 31, 2023, 3:21pm

I have a program that uses GPT4 Vision. The rate-limit for this is fairly low, so I often run into

Error code: 429 - {‘error’: {‘message’: ‘Rate limit reached for gpt-4-vision-preview in organization org-EffWqRFp1wI8rg0uHMEKqXmH on tokens per min (TPM): Limit 20000, Used 16317, Requested 4063. Please try again in 1.14s. Visit https://platform.openai.com/account/rate-limits to learn more.’, ‘type’: ‘tokens’, ‘param’: None, ‘code’: ‘rate_limit_exceeded’}}

The simplest solution for me would be to retry the request until it works. That would likely lead to several failed requests before the cooldown completes.

Do these failed requests cost any $$?

Diet · December 31, 2023, 3:44pm

I don’t think it eats your tokens because nothing was analyzed or generated

some users think rate limiting happens somewhere near the load balancer Client-side rate limiting - #10 by harjot.gill, and that the calculation doesn’t even use a proper tokenizer.

As how to deal with this: you’re supposed to use exponential backoff https://platform.openai.com/docs/guides/rate-limits/error-mitigation

just naively retrying will probably get you banned or throttled by cloudflare, so I wouldn’t recommend it.

_j · December 31, 2023, 3:52pm

You are just blocked with an error until you have rate freed for the request.

You could also follow the advice of the message and wait the given amount of time before proceeding…that way OpenAI doesn’t have to take further action.

from openai import OpenAI
import re

def get_context_error(err_msg):
    """Search for the time value (assumes it is a single number: seconds)"""
    match = re.search(r'Please try again in ([\d.]+)', err_msg)
    if match:
        return match.group(1)
    else:
        raise ValueError("No try again time found in error message")

def chat_call(modelparam):
    """talk to AI"""
    cl = OpenAI()
    try:
        response = cl.chat.completions.create(
            model=modelparam, max_tokens=25,
            messages=[{"role": "system", "content": "hello"}]
        )
        return response.choices[0].message
    except Exception as e:
        err = e
        # print(f"Error: {err}")
        if err.code == 'rate_limit_exceeded':
            time.sleep(get_context_error(err.body['message']))
        else:
            raise ValueError(err)


if __name__ == "__main__":
    model = "gpt-4-1106-preview"  # just chat completion models
    message = chat_call(model)  # use model
    print(message.model_dump())

Topic		Replies	Views
Cost for failed GPT requests API gpt-4	8	3173	June 8, 2023
Does OpenAI charge you for failed (timeout error) requests? API	1	1401	April 18, 2023
GPT rate limit handling, prompts rejected for TPM still charged to account? API gpt-4 , gpt-35-turbo , api , rate-limit	2	984	January 25, 2024
$5 consumed in 4 noob requests or less API	5	109	October 29, 2024
Rate Limit even though Limit not breached API rate-limit	7	2039	March 10, 2024

Does a Failed Request Eat up $$

Related topics