Configuring timeout for ChatCompletion Python

tdenton8772 · March 20, 2023, 12:38am

I, like many of the rest of us, are running into the random 600s read_timeout error with the ChatCompletions api using gpt-3.5-turbo. I have looked through the python code and see that there is a timeout parameter that doesn’t seem to be documented in the api docs.

github.com

openai/openai-python/blob/1d6142f376067e401492ca92ff88a08deb47d6ba/openai/api_resources/chat_completion.py#L27


      
              See https://platform.openai.com/docs/api-reference/chat-completions/create
              for a list of valid parameters.
              """
              start = time.time()
              timeout = kwargs.pop("timeout", None)
          
          
    while True:
                  try:
                      return super().create(*args, **kwargs)
                  except TryAgain as e:
                      if timeout is not None and time.time() > start + timeout:
                          raise
          
          
            util.log_info("Waiting for model to warm up", error=e)
          
          
@classmethod
          async def acreate(cls, *args, **kwargs):
              """
              Creates a new chat completion for the provided messages and parameters.
          
          
    See https://platform.openai.com/docs/api-reference/chat-completions/create

I have tried setting this timeout and while it is accepted by OpenAI without error it does not seem to have an effect. The timeout still occurs at 600s when most of my requests take between 2 and 5 sec.

Has anyone come up with a way to set the timeout or configured a server side timeout without going to Requests?

anon10827405 · March 20, 2023, 12:49am

Timeouts are usually designed outside of the actual request.

tdenton8772 · March 20, 2023, 1:30am

That is a decent idea, thank you. I have implemented wrapt_timeout_decorator but there really should be a configurable timeout in the api request with a retry and error handler wrapping the call.

anon10827405 · March 20, 2023, 1:53am

Could you imagine asking your chef to tell you if things are too busy?
He’d probably be too busy to tell you.

ruby_coder · March 20, 2023, 4:04am

Hi @tdenton8772

Actually, normally managing the timeout for a client-server transitions, like the OpenAPI APIs calls occurs in the client-side code.

This is true of just about every API call (not just OpenAI).

For example when the network is congested, the server might never see the API call, so the timeout will be managed by the client.

Extending this to OpenAI models, when their model is congested, it might not even return an error message (as we have been seeing today), and so it remains the responsibility of the client-side to manage the timeout, which is subtask of overall error and exceptions handling.

HTH

Appendix: Example Tutorials for “Timeouts in Python Requests”

curt.kennedy · March 20, 2023, 4:42am

I call the API from AWS Lambda in production, which has a configurable timeout value for each function you write. It has built in retry’s too.

For high priority stuff, I would invoke the function and also write the invocation event to a database. Then check the database every minute to see if it fired successfully, and retry up to N times until giving up.

AgusPG · March 20, 2023, 9:19am

In addition to this: I’d also recommend having some sort of fallback strategy. Sometimes a model is completely off but the other ones are working seamlessly. Retrying the same model over and over will not help, but falling back to a different (usually worse) model will do.

In my case, a generic service checks out the model’s health by pinging them every minute, and updates this models’ health in a database. When any of the other services want to call OpenAI’s API using a particular model, they firstly retrieve the model’s health from the database. If it’s ok, they call it with a retry mechanism. If not, they call the best one available at that precise moment (also with a retry mechanism).

ruby_coder · March 20, 2023, 9:25am

An ICMP echo ping only check the network.

You need to call the API to “ping” the model.

AgusPG · March 20, 2023, 9:26am

You’re 100% right. Imprecise term here. I do not ping, but actually do a dumb API call. Thanks for the correction!

ruby_coder · March 20, 2023, 9:30am

Haha welcome!

I was confident, knowing you @AgusPG, that you were using a bare bones API call to “ping” the model!

Thanks for the precision.

tdenton8772 · March 20, 2023, 3:31pm

The actual answer here is to use the undocumented parameter request_timeout

miguelwon · March 20, 2023, 4:51pm

@tdenton8772 , can you show an example? Thanks

anon10827405 · March 20, 2023, 4:55pm

Lord, please do not use undocumented parameters. They are undocumented for a reason.

Use your own timeout feature with a library such as Retry. It’s so simple it is insane that it’s not already being done. retry · PyPI

@retry(delay=1, backoff=2, max_delay=120)
def func():
  pass

You can read more about handling rate limits, and inconsistent connections here:

github.com

openai/openai-cookbook/blob/main/examples/How_to_handle_rate_limits.ipynb

{
 "cells": [
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# How to handle rate limits\n",
    "\n",
    "When you call the OpenAI API repeatedly, you may encounter error messages that say `429: 'Too Many Requests'` or `RateLimitError`. These error messages come from exceeding the API's rate limits.\n",
    "\n",
    "This guide shares tips for avoiding and handling rate limit errors.\n",
    "\n",
    "To see an example script for throttling parallel requests to avoid rate limit errors, see [api_request_parallel_processor.py](api_request_parallel_processor.py).\n",
    "\n",
    "## Why rate limits exist\n",
    "\n",
    "Rate limits are a common practice for APIs, and they're put in place for a few different reasons.\n",
    "\n",
    "- First, they help protect against abuse or misuse of the API. For example, a malicious actor could flood the API with requests in an attempt to overload it or cause disruptions in service. By setting rate limits, OpenAI can prevent this kind of activity.\n",

This file has been truncated. show original

Once the parameters are documented I would totally use both retry and the timeout parameter.

tdenton8772 · March 20, 2023, 6:33pm

from wrapt_timeout_decorator import timeout
import openai

@timeout(35)
def engine_abstraction(model, prompt, max_tokens=2046, n=1, stop=None, temperature=1, job=None):
	if model == "gpt-3.5-turbo":
              response = openai.ChatCompletion.create(
                  model="gpt-3.5-turbo", 
                  messages=messages,
                  max_tokens=max_tokens,
                  request_timeout = 30,
                  n=n,
                  stop=stop,
                  temperature=temperature
        	)

This is not the entire code or even a reproducible example. I’m only showing both the implementation of the decorator to timeout the entire function. And, the timeout parameter request_timeout.

drfalken · March 22, 2023, 5:15pm

Fyi, I can’t seem to get retry or tenacity to work with a timeout. It seems like openai.ChatCompletion.create is blocking it? Any other suggestions?

anon10827405 · March 22, 2023, 5:16pm

The retry library doesn’t interact with OpenAI at all. It’s completely a client-side script. What’s your code snippet?

drfalken · March 22, 2023, 5:20pm

This does not break after 10 sec…

@tenacity.retry(stop=tenacity.stop_after_delay(10))
def completion_with_backoff(**kwargs):
    try:
        return = openai.ChatCompletion.create(**kwargs)
    except Exception as e:
        print(e)
        raise e        

def ask(prompt,choices=None,temperature=0.6,max_tokens=10,presence_penalty=0,system_message=''):
    
    try:

        returnval = completion_with_backoff(
        model="gpt-4",
        messages=[
                    {"role": "system", "content": system_message},
                    {"role": "user", "content": prompt}
                ]
            )
        response_str = returnval['choices'][0]['message']['content']

    except:
        print('here')```

anon10827405 · March 22, 2023, 5:24pm

Ah, I don’t use tenacity but the retry library that I linked which the decorator would look like this:

@retry(delay=1, backoff=2, max_delay=120)

If I had to guess, your try/catch is preventing it from working properly. Try removing that and see if it works

drfalken · March 22, 2023, 5:33pm

Thanks - no go with retry either. I’m a noob so I don’t really know whats happening, but it appears that functions can block these decorators from working. For example, if you wrap a time.sleep call, it won’t raise an exception until the time.sleep actually ends.

anon10827405 · March 22, 2023, 5:39pm

The part where you have try: and except. Did you try taking that out so all that the function holds is the API call? The retry library uses it’s own method of catching exceptions.

Here’s two example functions

@retry(delay=1, backoff=2, max_delay=120)
def failsModeration(prompt: str) -> bool:
    return openai.Moderation.create(
        input=prompt
    )["results"][0]["flagged"]

response = retry_call(
        openai.ChatCompletion.create,
        fkwargs={
            "model": "gpt-3.5-turbo",
            "messages": conversation,
            "max_tokens": 2000,
            "temperature": temperature,
            "frequency_penalty": frequency_penalty,
            "presence_penalty": presence_penalty
        },
        delay=1,
        backoff=2,
        max_delay=120
    )

I believe your function should look like so:

@tenacity.retry(stop=tenacity.stop_after_delay(10))
def completion_with_backoff(**kwargs):
    return openai.ChatCompletion.create(**kwargs)

wait

I was just looking over my comment again and I noticed you have

return = openai.ChatCompletion.create(**kwargs)
You need to remove the = sign aswell @drfalken

Topic		Replies	Views
Frequent API timeout errors recently API	39	49234	December 12, 2023
Timeout for OpenAI chat completion in Python API api , python	6	28347	December 16, 2023
Recommended way to limit the amount of time a Python ChatCompletion.create() runs API gpt-4	8	2510	September 15, 2023
Setting request_timeout in openai v1.2.2 API	3	17658	November 10, 2023
Timeout not honored in Python API? API	11	4358	June 8, 2023

Configuring timeout for ChatCompletion Python

Appendix: Example Tutorials for “Timeouts in Python Requests”

Related topics