In addition to this: I’d also recommend having some sort of fallback strategy. Sometimes a model is completely off but the other ones are working seamlessly. Retrying the same model over and over will not help, but falling back to a different (usually worse) model will do.

In my case, a generic service checks out the model’s health by pinging them every minute, and updates this models’ health in a database. When any of the other services want to call OpenAI’s API using a particular model, they firstly retrieve the model’s health from the database. If it’s ok, they call it with a retry mechanism. If not, they call the best one available at that precise moment (also with a retry mechanism).

4 Likes

An ICMP echo ping only check the network.

You need to call the API to “ping” the model.

:slight_smile:

2 Likes

You’re 100% right. Imprecise term here. I do not ping, but actually do a dumb API call. Thanks for the correction!

3 Likes

Haha welcome!

I was confident, knowing you @AgusPG, that you were using a bare bones API call to “ping” the model!

Thanks for the precision.

:slight_smile:

1 Like

The actual answer here is to use the undocumented parameter request_timeout

1 Like

@tdenton8772 , can you show an example? Thanks

Lord, please do not use undocumented parameters. They are undocumented for a reason.

Use your own timeout feature with a library such as Retry. It’s so simple it is insane that it’s not already being done. retry · PyPI

@retry(delay=1, backoff=2, max_delay=120)
def func():
  pass

You can read more about handling rate limits, and inconsistent connections here:

Once the parameters are documented I would totally use both retry and the timeout parameter.

3 Likes
from wrapt_timeout_decorator import timeout
import openai

@timeout(35)
def engine_abstraction(model, prompt, max_tokens=2046, n=1, stop=None, temperature=1, job=None):
	if model == "gpt-3.5-turbo":
              response = openai.ChatCompletion.create(
                  model="gpt-3.5-turbo", 
                  messages=messages,
                  max_tokens=max_tokens,
                  request_timeout = 30,
                  n=n,
                  stop=stop,
                  temperature=temperature
        	)

This is not the entire code or even a reproducible example. I’m only showing both the implementation of the decorator to timeout the entire function. And, the timeout parameter request_timeout.

2 Likes

Fyi, I can’t seem to get retry or tenacity to work with a timeout. It seems like openai.ChatCompletion.create is blocking it? Any other suggestions?

The retry library doesn’t interact with OpenAI at all. It’s completely a client-side script. What’s your code snippet?

This does not break after 10 sec…

@tenacity.retry(stop=tenacity.stop_after_delay(10))
def completion_with_backoff(**kwargs):
    try:
        return = openai.ChatCompletion.create(**kwargs)
    except Exception as e:
        print(e)
        raise e        

def ask(prompt,choices=None,temperature=0.6,max_tokens=10,presence_penalty=0,system_message=''):
    
    try:

        returnval = completion_with_backoff(
        model="gpt-4",
        messages=[
                    {"role": "system", "content": system_message},
                    {"role": "user", "content": prompt}
                ]
            )
        response_str = returnval['choices'][0]['message']['content']

    except:
        print('here')```

Ah, I don’t use tenacity but the retry library that I linked which the decorator would look like this:

@retry(delay=1, backoff=2, max_delay=120)

If I had to guess, your try/catch is preventing it from working properly. Try removing that and see if it works

Thanks - no go with retry either. I’m a noob so I don’t really know whats happening, but it appears that functions can block these decorators from working. For example, if you wrap a time.sleep call, it won’t raise an exception until the time.sleep actually ends.

The part where you have try: and except. Did you try taking that out so all that the function holds is the API call? The retry library uses it’s own method of catching exceptions.

Here’s two example functions

@retry(delay=1, backoff=2, max_delay=120)
def failsModeration(prompt: str) -> bool:
    return openai.Moderation.create(
        input=prompt
    )["results"][0]["flagged"]
response = retry_call(
        openai.ChatCompletion.create,
        fkwargs={
            "model": "gpt-3.5-turbo",
            "messages": conversation,
            "max_tokens": 2000,
            "temperature": temperature,
            "frequency_penalty": frequency_penalty,
            "presence_penalty": presence_penalty
        },
        delay=1,
        backoff=2,
        max_delay=120
    )

I believe your function should look like so:

@tenacity.retry(stop=tenacity.stop_after_delay(10))
def completion_with_backoff(**kwargs):
    return openai.ChatCompletion.create(**kwargs)

wait

I was just looking over my comment again and I noticed you have

return = openai.ChatCompletion.create(**kwargs)
You need to remove the = sign aswell @drfalken

2 Likes

You can use this client implemented with asyncio and httpx.
It supports fine grained connect/read timeout setting and connection reuse.

from httpx import Timeout
from openai_async_client import AsyncCreate, Message, ChatCompletionRequest, SystemMessage, OpenAIParams

create = AsyncCreate(api_key=os.environ["OPENAI_API_KEY"])
messages = [
    Message(
        role="user",
        content=f"ChatGPT, Give a brief overview of the Pride and Prejudice by Jane Austen.",
    )
]
response = create.completion(ChatCompletionRequest(prompt=messages),client_timeout=Timeout(1.0,read=10.0),retries=3)


create = AsyncCreate()
response = create.completion(TextCompletionRequest(prompt=f"DaVinci, Give a brief overview of Moby Dick by  Herman Melville."))

I completely recommend granular timeouts, but in a general use-case it makes absolutely no sense. Especially in the context of an OpenAI request.

I’d say it’s like recommending sport car parts to someone who just wants to fix their Honda Civic (fantastic car btw).

1 Like

The easiest way is to add parameter request_timeout, it will be pass to requests.post(timeout=xxx)

eg:

openai.ChatCompletion.create(
model=“gpt-3.5-turbo”,
messages=[
{
“role”: “user”,
“content”: prompt,
}
],
request_timeout=60,
)

2 Likes

This is a great way.

I’d just like to add that a retry / backoff library is also a great option. In the event that a timeout, or some sort of intermittent error occurs, it will automatically retry using dynamic intervals.

This worked for me along with using Ronald’s suggestion to use the retry with it. Thanks!

Yes, request_timeout is very important. Most people tell how to write retry decorator, but it can not solve the problem of SLOW. Set this parameter will help a lot.
Meanwhile, use other Parallel method will help when you do not reach you RPM of your account, Like pool.apply_async().
In conclusion, retry decorator+request_timeout parameter+parallel method will accelerate your chatgpt application.