I have a program that uses GPT4 Vision. The rate-limit for this is fairly low, so I often run into
Error code: 429 - {‘error’: {‘message’: ‘Rate limit reached for gpt-4-vision-preview in organization org-EffWqRFp1wI8rg0uHMEKqXmH on tokens per min (TPM): Limit 20000, Used 16317, Requested 4063. Please try again in 1.14s. Visit https://platform.openai.com/account/rate-limits to learn more.’, ‘type’: ‘tokens’, ‘param’: None, ‘code’: ‘rate_limit_exceeded’}}
The simplest solution for me would be to retry the request until it works. That would likely lead to several failed requests before the cooldown completes.
You are just blocked with an error until you have rate freed for the request.
You could also follow the advice of the message and wait the given amount of time before proceeding…that way OpenAI doesn’t have to take further action.
from openai import OpenAI
import re
def get_context_error(err_msg):
"""Search for the time value (assumes it is a single number: seconds)"""
match = re.search(r'Please try again in ([\d.]+)', err_msg)
if match:
return match.group(1)
else:
raise ValueError("No try again time found in error message")
def chat_call(modelparam):
"""talk to AI"""
cl = OpenAI()
try:
response = cl.chat.completions.create(
model=modelparam, max_tokens=25,
messages=[{"role": "system", "content": "hello"}]
)
return response.choices[0].message
except Exception as e:
err = e
# print(f"Error: {err}")
if err.code == 'rate_limit_exceeded':
time.sleep(get_context_error(err.body['message']))
else:
raise ValueError(err)
if __name__ == "__main__":
model = "gpt-4-1106-preview" # just chat completion models
message = chat_call(model) # use model
print(message.model_dump())