That is a decent idea, thank you. I have implemented wrapt_timeout_decorator but there really should be a configurable timeout in the api request with a retry and error handler wrapping the call.
1 Like
Could you imagine asking your chef to tell you if things are too busy?
He’d probably be too busy to tell you.
1 Like
Hi @tdenton8772
Actually, normally managing the timeout for a client-server transitions, like the OpenAPI APIs calls occurs in the client-side code.
This is true of just about every API call (not just OpenAI).
For example when the network is congested, the server might never see the API call, so the timeout will be managed by the client.
Extending this to OpenAI models, when their model is congested, it might not even return an error message (as we have been seeing today), and so it remains the responsibility of the client-side to manage the timeout, which is subtask of overall error and exceptions handling.
HTH

Appendix: Example Tutorials for “Timeouts in Python Requests”
2 Likes
I call the API from AWS Lambda in production, which has a configurable timeout value for each function you write. It has built in retry’s too.
For high priority stuff, I would invoke the function and also write the invocation event to a database. Then check the database every minute to see if it fired successfully, and retry up to N times until giving up.
4 Likes
AgusPG
7
In addition to this: I’d also recommend having some sort of fallback strategy. Sometimes a model is completely off but the other ones are working seamlessly. Retrying the same model over and over will not help, but falling back to a different (usually worse) model will do.
In my case, a generic service checks out the model’s health by pinging them every minute, and updates this models’ health in a database. When any of the other services want to call OpenAI’s API using a particular model, they firstly retrieve the model’s health from the database. If it’s ok, they call it with a retry mechanism. If not, they call the best one available at that precise moment (also with a retry mechanism).
4 Likes
An ICMP echo ping only check the network.
You need to call the API to “ping” the model.

2 Likes
AgusPG
9
You’re 100% right. Imprecise term here. I do not ping, but actually do a dumb API call. Thanks for the correction!
3 Likes
Haha welcome!
I was confident, knowing you @AgusPG, that you were using a bare bones API call to “ping” the model!
Thanks for the precision.

1 Like
The actual answer here is to use the undocumented parameter request_timeout
1 Like
@tdenton8772 , can you show an example? Thanks
Lord, please do not use undocumented parameters. They are undocumented for a reason.
Use your own timeout feature with a library such as Retry. It’s so simple it is insane that it’s not already being done. retry · PyPI
@retry(delay=1, backoff=2, max_delay=120)
def func():
pass
You can read more about handling rate limits, and inconsistent connections here:
Once the parameters are documented I would totally use both retry and the timeout parameter.
3 Likes
from wrapt_timeout_decorator import timeout
import openai
@timeout(35)
def engine_abstraction(model, prompt, max_tokens=2046, n=1, stop=None, temperature=1, job=None):
if model == "gpt-3.5-turbo":
response = openai.ChatCompletion.create(
model="gpt-3.5-turbo",
messages=messages,
max_tokens=max_tokens,
request_timeout = 30,
n=n,
stop=stop,
temperature=temperature
)
This is not the entire code or even a reproducible example. I’m only showing both the implementation of the decorator to timeout the entire function. And, the timeout parameter request_timeout.
2 Likes
Fyi, I can’t seem to get retry or tenacity to work with a timeout. It seems like openai.ChatCompletion.create is blocking it? Any other suggestions?
The retry library doesn’t interact with OpenAI at all. It’s completely a client-side script. What’s your code snippet?
This does not break after 10 sec…
@tenacity.retry(stop=tenacity.stop_after_delay(10))
def completion_with_backoff(**kwargs):
try:
return = openai.ChatCompletion.create(**kwargs)
except Exception as e:
print(e)
raise e
def ask(prompt,choices=None,temperature=0.6,max_tokens=10,presence_penalty=0,system_message=''):
try:
returnval = completion_with_backoff(
model="gpt-4",
messages=[
{"role": "system", "content": system_message},
{"role": "user", "content": prompt}
]
)
response_str = returnval['choices'][0]['message']['content']
except:
print('here')```
Ah, I don’t use tenacity but the retry library that I linked which the decorator would look like this:
@retry(delay=1, backoff=2, max_delay=120)
If I had to guess, your try/catch is preventing it from working properly. Try removing that and see if it works
Thanks - no go with retry either. I’m a noob so I don’t really know whats happening, but it appears that functions can block these decorators from working. For example, if you wrap a time.sleep call, it won’t raise an exception until the time.sleep actually ends.
The part where you have try: and except. Did you try taking that out so all that the function holds is the API call? The retry library uses it’s own method of catching exceptions.
Here’s two example functions
@retry(delay=1, backoff=2, max_delay=120)
def failsModeration(prompt: str) -> bool:
return openai.Moderation.create(
input=prompt
)["results"][0]["flagged"]
response = retry_call(
openai.ChatCompletion.create,
fkwargs={
"model": "gpt-3.5-turbo",
"messages": conversation,
"max_tokens": 2000,
"temperature": temperature,
"frequency_penalty": frequency_penalty,
"presence_penalty": presence_penalty
},
delay=1,
backoff=2,
max_delay=120
)
I believe your function should look like so:
@tenacity.retry(stop=tenacity.stop_after_delay(10))
def completion_with_backoff(**kwargs):
return openai.ChatCompletion.create(**kwargs)
wait
I was just looking over my comment again and I noticed you have
return = openai.ChatCompletion.create(**kwargs)
You need to remove the = sign aswell @drfalken
2 Likes
You can use this client implemented with asyncio and httpx.
It supports fine grained connect/read timeout setting and connection reuse.
from httpx import Timeout
from openai_async_client import AsyncCreate, Message, ChatCompletionRequest, SystemMessage, OpenAIParams
create = AsyncCreate(api_key=os.environ["OPENAI_API_KEY"])
messages = [
Message(
role="user",
content=f"ChatGPT, Give a brief overview of the Pride and Prejudice by Jane Austen.",
)
]
response = create.completion(ChatCompletionRequest(prompt=messages),client_timeout=Timeout(1.0,read=10.0),retries=3)
create = AsyncCreate()
response = create.completion(TextCompletionRequest(prompt=f"DaVinci, Give a brief overview of Moby Dick by Herman Melville."))
I completely recommend granular timeouts, but in a general use-case it makes absolutely no sense. Especially in the context of an OpenAI request.
I’d say it’s like recommending sport car parts to someone who just wants to fix their Honda Civic (fantastic car btw).
1 Like