Recommended way to limit the amount of time a Python ChatCompletion.create() runs

Maybe I’m just missing a concept or documentation, but I’ve had situations where a gpt-4 ChatCompletion.create() response can take multiple minutes to reply, to the same prompt that usually takes 30 seconds. I’m thinking this could be the result of a bug on the API side that sometimes arises.

I don’t see any obvious way to set a timeout on the Python call. I could wrap the call in a thread or another process, but that seems rather heavy.

Is there another way to think about this? Or another suggestion?

Either the image here is from a “hallucination” or else their API docs just forgot to mention “timeout” not sure which it is.

BTW: that’s from GPT-4. GPT-3.5-turbo was unaware and gave some different workaround when I asked.

That looks like what I need, but adding this parameter doesn’t seem to do anything. I added timeout=5 to my call and I’m still getting response times much longer.

So it’s great that GPT-4 agrees that it seems like a good idea. But apparently it’s not. Otherwise, I’d think it would be there. I feel like I’m missing something.

Next step I’d do then is just download the Python package itself (like a zip file from GitHub right), then search for that method in it. Should be super trivial to see if there’s any timeout being set, because that method will just be a wrapper around the HTTP call.

It looks like, but is not, because the AI makes up plausible nonsense.

The actual parameter that can be passed with the ChatCompletion():

api_params = {
"model": model,
"max_tokens": max_tokens,
"temperature": temperature,
"messages": all_messages,
"request_timeout": 3,
}

It will get you:

“API Error: Request timed out: HTTPSConnectionPool(host=‘api.openai.com’, port=443): Read timed out. (read timeout=3)”

The connection is terminated regardless of if the AI is in the process of answering, with error thrown.

You can also set the variable directly into the library if the parameter method was to be removed:
openai.api_requestor.TIMEOUT_SECS = 3

I bet you the property used to be named “timeout”, and they renamed it. Or it could be a hallucination from a parallel universe. :clown_face: Anyway, they probably need to update their docs, to include it.

Thanks for looking into the actual code to set us straight!

I bet you the AI will just make up whatever it wants, and the current model is poor enough to go loopy on further questions.

The timeout parameter is an optional parameter that can be passed to the create method.

Inside the method, the timeout parameter is extracted from the kwargs dictionary using the pop. If timeout isn’t provided, its default value will be None

class ChatCompletion(EngineAPIResource):
    engine_required = False
    OBJECT_NAME = "chat.completions"

    @classmethod
    def create(cls, *args, **kwargs):
        """
        Creates a new chat completion for the provided messages and parameters.

        See https://platform.openai.com/docs/api-reference/chat/create
        for a list of valid parameters.
        """
        start = time.time()
        timeout = kwargs.pop("timeout", None)

        while True:
            try:
                return super().create(*args, **kwargs)
            except TryAgain as e:
                if timeout is not None and time.time() > start + timeout:
                    raise

                util.log_info("Waiting for model to warm up", error=e)

However, after this line:

timeout = kwargs.pop("timeout", None)

the timeout parameter no longer exists in the kwargs dictionary because pop removes the key from the dictionary.

Here's super().create()
@classmethod
    def create(
        cls,
        api_key=None,
        api_base=None,
        api_type=None,
        request_id=None,
        api_version=None,
        organization=None,
        **params,
    ):
        (
            deployment_id,
            engine,
            timeout,
            stream,
            headers,
            request_timeout,
            typed_api_type,
            requestor,
            url,
            params,
        ) = cls.__prepare_create_request(
            api_key, api_base, api_type, api_version, organization, **params
        )

        response, _, api_key = requestor.request(
            "post",
            url,
            params=params,
            headers=headers,
            stream=stream,
            request_id=request_id,
            request_timeout=request_timeout,
        )

        if stream:
            # must be an iterator
            assert not isinstance(response, OpenAIResponse)
            return (
                util.convert_to_openai_object(
                    line,
                    api_key,
                    api_version,
                    organization,
                    engine=engine,
                    plain_old_data=cls.plain_old_data,
                )
                for line in response
            )
        else:
            obj = util.convert_to_openai_object(
                response,
                api_key,
                api_version,
                organization,
                engine=engine,
                plain_old_data=cls.plain_old_data,
            )

            if timeout is not None:
                obj.wait(timeout=timeout or None)

        return obj

To conclude:

  • The timeout parameter is popped from kwargs in ChatCompletion.create(), so it won’t be passed to super().create() via **kwargs.
  • The request_timeout parameter, if provided to ChatCompletion.create(), will be passed to super().create() via **kwargs.
1 Like