Configuring timeout for ChatCompletion Python

I, like many of the rest of us, are running into the random 600s read_timeout error with the ChatCompletions api using gpt-3.5-turbo. I have looked through the python code and see that there is a timeout parameter that doesn’t seem to be documented in the api docs.

I have tried setting this timeout and while it is accepted by OpenAI without error it does not seem to have an effect. The timeout still occurs at 600s when most of my requests take between 2 and 5 sec.

Has anyone come up with a way to set the timeout or configured a server side timeout without going to Requests?

3 Likes

Timeouts are usually designed outside of the actual request.

That is a decent idea, thank you. I have implemented wrapt_timeout_decorator but there really should be a configurable timeout in the api request with a retry and error handler wrapping the call.

1 Like

Could you imagine asking your chef to tell you if things are too busy?
He’d probably be too busy to tell you.

1 Like

Hi @tdenton8772

Actually, normally managing the timeout for a client-server transitions, like the OpenAPI APIs calls occurs in the client-side code.

This is true of just about every API call (not just OpenAI).

For example when the network is congested, the server might never see the API call, so the timeout will be managed by the client.

Extending this to OpenAI models, when their model is congested, it might not even return an error message (as we have been seeing today), and so it remains the responsibility of the client-side to manage the timeout, which is subtask of overall error and exceptions handling.

HTH

:slight_smile:

Appendix: Example Tutorials for “Timeouts in Python Requests”

2 Likes

I call the API from AWS Lambda in production, which has a configurable timeout value for each function you write. It has built in retry’s too.

For high priority stuff, I would invoke the function and also write the invocation event to a database. Then check the database every minute to see if it fired successfully, and retry up to N times until giving up.

4 Likes

In addition to this: I’d also recommend having some sort of fallback strategy. Sometimes a model is completely off but the other ones are working seamlessly. Retrying the same model over and over will not help, but falling back to a different (usually worse) model will do.

In my case, a generic service checks out the model’s health by pinging them every minute, and updates this models’ health in a database. When any of the other services want to call OpenAI’s API using a particular model, they firstly retrieve the model’s health from the database. If it’s ok, they call it with a retry mechanism. If not, they call the best one available at that precise moment (also with a retry mechanism).

4 Likes

An ICMP echo ping only check the network.

You need to call the API to “ping” the model.

:slight_smile:

2 Likes

You’re 100% right. Imprecise term here. I do not ping, but actually do a dumb API call. Thanks for the correction!

3 Likes

Haha welcome!

I was confident, knowing you @AgusPG, that you were using a bare bones API call to “ping” the model!

Thanks for the precision.

:slight_smile:

1 Like

The actual answer here is to use the undocumented parameter request_timeout

1 Like

@tdenton8772 , can you show an example? Thanks

Lord, please do not use undocumented parameters. They are undocumented for a reason.

Use your own timeout feature with a library such as Retry. It’s so simple it is insane that it’s not already being done. retry · PyPI

@retry(delay=1, backoff=2, max_delay=120)
def func():
  pass

You can read more about handling rate limits, and inconsistent connections here:

Once the parameters are documented I would totally use both retry and the timeout parameter.

3 Likes
from wrapt_timeout_decorator import timeout
import openai

@timeout(35)
def engine_abstraction(model, prompt, max_tokens=2046, n=1, stop=None, temperature=1, job=None):
	if model == "gpt-3.5-turbo":
              response = openai.ChatCompletion.create(
                  model="gpt-3.5-turbo", 
                  messages=messages,
                  max_tokens=max_tokens,
                  request_timeout = 30,
                  n=n,
                  stop=stop,
                  temperature=temperature
        	)

This is not the entire code or even a reproducible example. I’m only showing both the implementation of the decorator to timeout the entire function. And, the timeout parameter request_timeout.

2 Likes

Fyi, I can’t seem to get retry or tenacity to work with a timeout. It seems like openai.ChatCompletion.create is blocking it? Any other suggestions?

The retry library doesn’t interact with OpenAI at all. It’s completely a client-side script. What’s your code snippet?

This does not break after 10 sec…

@tenacity.retry(stop=tenacity.stop_after_delay(10))
def completion_with_backoff(**kwargs):
    try:
        return = openai.ChatCompletion.create(**kwargs)
    except Exception as e:
        print(e)
        raise e        

def ask(prompt,choices=None,temperature=0.6,max_tokens=10,presence_penalty=0,system_message=''):
    
    try:

        returnval = completion_with_backoff(
        model="gpt-4",
        messages=[
                    {"role": "system", "content": system_message},
                    {"role": "user", "content": prompt}
                ]
            )
        response_str = returnval['choices'][0]['message']['content']

    except:
        print('here')```

Ah, I don’t use tenacity but the retry library that I linked which the decorator would look like this:

@retry(delay=1, backoff=2, max_delay=120)

If I had to guess, your try/catch is preventing it from working properly. Try removing that and see if it works

Thanks - no go with retry either. I’m a noob so I don’t really know whats happening, but it appears that functions can block these decorators from working. For example, if you wrap a time.sleep call, it won’t raise an exception until the time.sleep actually ends.

The part where you have try: and except. Did you try taking that out so all that the function holds is the API call? The retry library uses it’s own method of catching exceptions.

Here’s two example functions

@retry(delay=1, backoff=2, max_delay=120)
def failsModeration(prompt: str) -> bool:
    return openai.Moderation.create(
        input=prompt
    )["results"][0]["flagged"]
response = retry_call(
        openai.ChatCompletion.create,
        fkwargs={
            "model": "gpt-3.5-turbo",
            "messages": conversation,
            "max_tokens": 2000,
            "temperature": temperature,
            "frequency_penalty": frequency_penalty,
            "presence_penalty": presence_penalty
        },
        delay=1,
        backoff=2,
        max_delay=120
    )

I believe your function should look like so:

@tenacity.retry(stop=tenacity.stop_after_delay(10))
def completion_with_backoff(**kwargs):
    return openai.ChatCompletion.create(**kwargs)

wait

I was just looking over my comment again and I noticed you have

return = openai.ChatCompletion.create(**kwargs)
You need to remove the = sign aswell @drfalken

2 Likes