Timeout for OpenAI chat completion in Python

I’m using the OpenAI API in a Python 3.10 program using the python client library, the model is ‘gpt-3.5-turbo’. I’ve noticed that periodically the openai.chatcompletion call randomly takes a very long time to complete. I want to protect my users from having to wait for completion by timing out the API request.

After reviewing various sources (including these forums), I’ve tried a few different ways to implement the timeout:

  1. Setting the API parameter request_timeout=60 . This doesn’t seem to be honored by the API, calls infrequently last for 1000s of seconds.
  2. Using a variety of decorator functions around the API client call: timeout-decorator, timeout-function-decorator
  3. Directly using signal.alarm in python.

None of those work in actually timing out the function call; the program hangs until the call returns or until there is an eventual OpenAI timeout. I’ve tried using signal.alarm with a test program and it works perfectly. I’m running this on my Mac.

I’m wondering now if the OpenAI Python client creates its own threads? That would explain why signal doesn’t work. So I’m going to try calling the API directly using the requests library; if that doesn’t work, I’ve run out of ideas to try.

Has anyone else run into this problem? Any feedback much appreciated!

Hi and welcome to the Developer Forum!

For any application that makes use of external API’s you must make the assumption that the endpoint will never respond.

To do that you need to make non blocking calls or execute those that do block in their own threads and then monitor them for responses, if valid replies are not returned within a time limit you issue a retry with exponential backoff. You can then keep your user informed of the current status.

Here is the example:

import random
import time
import openai

# Define a retry decorator
def retry_with_exponential_backoff(
    func,
    initial_delay: float = 1,
    exponential_base: float = 2,
    jitter: bool = True,
    max_retries: int = 10,
    errors: tuple = (openai.error.RateLimitError,),
):
    """Retry a function with exponential backoff."""

    def wrapper(*args, **kwargs):
        num_retries = 0
        delay = initial_delay

        while True:
            try:
                return func(*args, **kwargs)

            except errors as e:
                num_retries += 1

                if num_retries > max_retries:
                    raise Exception(f"Maximum number of retries ({max_retries}) exceeded.")

                delay *= exponential_base * (1 + jitter * random.random())

                time.sleep(delay)

            except Exception as e:
                raise e

    return wrapper

@retry_with_exponential_backoff
def completions_with_backoff(**kwargs):
    return openai.Completion.create(**kwargs)

completions_with_backoff(model="text-davinci-003", prompt="Once upon a time,")

https://help.openai.com/en/?q=429

1 Like

Here is the commit that made the change, no timeout supported:

This still works to throw an error:
openai.api_requestor.TIMEOUT_SECS = 2

openai.error.Timeout: Request timed out: HTTPSConnectionPool(host=‘api.openai.com’, port=443): Read timed out. (read timeout=2)


Here’s threading a non-stream chat completion (response is data structure, not unhandled generator) that I simplified from my other hacks – with a timeout parameter.

(You can make methods throw an error instead of just reporting a message.)

import openai
import threading
openai.api_key = key

def api_call(result, api_parameters):
    api_response = openai.ChatCompletion.create(**api_parameters)
    result[0] = api_response

def chat_c_threaded(api_parameters):
    timeout = api_parameters.pop("timeout", None)
    result = [None]

    api_thread = threading.Thread(target=api_call, args=(result, api_parameters))
    api_thread.start()
    api_thread.join(timeout=timeout)  # Set the timeout for the API call

    if api_thread.is_alive():
        # If the API call is still running after the timeout, terminate it
        print("API call timeout, retrying...")

        api_thread.join(timeout=timeout + 1)  # Retry
        if api_thread.is_alive(): 
            print("API call still hanging, retry failed.")
            return {
                "choices": [
                    {
                    "index": 0,
                    "message": {
                        "role": "assistant",
                        "content": "API Timeout"
                    },
                    "finish_reason": "timeout"
                    }
                ],
            }

    # The API call has finished within the timeout or retried successfully
    return result[0]
# Usage ------------
if __name__ == "__main__":
    print("Threaded timeout example")
    for maxtoken in [10, 100, 500]:
        # set as json (not using equal sign), now with a working 'timeout'
        chat_properties = {
            "model": "gpt-3.5-turbo", "max_tokens": maxtoken, "top_p": 0.1, "timeout": 2.5,
            "messages": [
                {"role": "system", "content": "You are an AI assistant"},
                {"role": "user", "content": "Write a leprechaun story"},
            ]
        }
        print(chat_c_threaded(chat_properties)['choices'][0]['message'])

The usage example runs at three max_tokens values so you see good vs timeout.

Thanks @kevin6 . I’ve already implemented tenacity with retries, but I don’t think that solves this particular problem. If the call to OpenAI hangs, then tenacity by itself will not hang up the invocation; it can only implement the retry once the call returns. To Foxabilo’s point, if it’s in a separate thread then you can abandon that and retry in another thread.

For my use case, it’s ok to periodically say that the call timed out and not return a valid response, the problem is when it hangs waiting for a response. So simply terminating the call invocation should be sufficient, which is what signal.alarm is supposed to do.

Thank you for the sample code, @_j - I may try the threaded option next. But I’m going to see first if I can simply hang up on the call if I invoke the API directly. The only difference from your scenario is that in this case it will be the signal call that is in the alternate thread rather than the application code itself.

If you are already using asyncio, just use the timeout option they have built into it. I’ve been creating summaries of databases and I get the same long pause you are talking about. Sometimes I think it has crashed. I fixed it with the asyncio.wait_for() function.

import asyncio
import sys

async def wait_test():
    try:
        #Fake Async Server Call
        await asyncio.wait_for(asyncio.sleep(10), timeout=1)
    except asyncio.TimeoutError:
        print("TimeoutError")
        sys.exit(1)
    return


if __name__ == "__main__":
    #You'll need to figure out how to get your functionality in here.
    asyncio.run(wait_test()) 
1 Like