Does a Failed Request Eat up $$

I have a program that uses GPT4 Vision. The rate-limit for this is fairly low, so I often run into

Error code: 429 - {‘error’: {‘message’: ‘Rate limit reached for gpt-4-vision-preview in organization org-EffWqRFp1wI8rg0uHMEKqXmH on tokens per min (TPM): Limit 20000, Used 16317, Requested 4063. Please try again in 1.14s. Visit https://platform.openai.com/account/rate-limits to learn more.’, ‘type’: ‘tokens’, ‘param’: None, ‘code’: ‘rate_limit_exceeded’}}

The simplest solution for me would be to retry the request until it works. That would likely lead to several failed requests before the cooldown completes.

Do these failed requests cost any $$?

I don’t think it eats your tokens because nothing was analyzed or generated

some users think rate limiting happens somewhere near the load balancer Client-side rate limiting - #10 by harjot.gill, and that the calculation doesn’t even use a proper tokenizer.

As how to deal with this: you’re supposed to use exponential backoff https://platform.openai.com/docs/guides/rate-limits/error-mitigation

just naively retrying will probably get you banned or throttled by cloudflare, so I wouldn’t recommend it.

You are just blocked with an error until you have rate freed for the request.

You could also follow the advice of the message and wait the given amount of time before proceeding…that way OpenAI doesn’t have to take further action.

from openai import OpenAI
import re

def get_context_error(err_msg):
    """Search for the time value (assumes it is a single number: seconds)"""
    match = re.search(r'Please try again in ([\d.]+)', err_msg)
    if match:
        return match.group(1)
    else:
        raise ValueError("No try again time found in error message")

def chat_call(modelparam):
    """talk to AI"""
    cl = OpenAI()
    try:
        response = cl.chat.completions.create(
            model=modelparam, max_tokens=25,
            messages=[{"role": "system", "content": "hello"}]
        )
        return response.choices[0].message
    except Exception as e:
        err = e
        # print(f"Error: {err}")
        if err.code == 'rate_limit_exceeded':
            time.sleep(get_context_error(err.body['message']))
        else:
            raise ValueError(err)


if __name__ == "__main__":
    model = "gpt-4-1106-preview"  # just chat completion models
    message = chat_call(model)  # use model
    print(message.model_dump())
1 Like