Timeout not honored in Assistants Python API

Is the Assistants Python API is not honoring the timeout argument?

Here’s how I’m calling it for testing (normally I use a longer timeout):

return client.beta.threads.runs.create(
          thread_id=chat_thread.id,
          assistant_id=assistant.id,
          instructions=instructions,
          timeout=1.0,
)

Where type(client) is <class 'openai.OpenAI'>. This runs properly otherwise but does not trigger a APITimeoutError exception after running for 5+ seconds.

On the other hand, as evidence that this isn’t a PEBCAK error, I am properly seeing openai.APITimeoutError when using client.chat.completions.create, and which properly honors its timeout argument in tests.

You don’t call with it, you set it.

import openai
client = openai.Client(timeout=2)

or you can pass a httpx timeout object.

client = openai.Client(timeout=2)

It’s not respecting that either.

Other operations using that client are timing out as expected when I set a low value, but not Assistants.

1 Like

There shouldn’t be too many delays in the actual network requests of assistants since all the data is at the ready, except for where it actually returns nothing to the open connection.

Here’s chat code I verified won’t tolerate a one-second delay between streaming chunks, but will not timeout as long as the tokens keep flowing and the response is immediate. At the edge of breaking (ask it for something copyrighted for a delay…) You can just drop this httpx timeout spec in with preposterously low limits and see if it is still of no effect in assistants calls.

from openai import OpenAI
import httpx
client = OpenAI(max_retries = 1, timeout=httpx.Timeout(0.1,
                                                       connect=1.0,
                                                       pool=1.0,
                                                       write=1.5,
                                                       read=0.8)
                )
example_base64 = 'iVBORw0KGgoAAAANSUhEUgAAAIAAAABACAMAAADlCI9NAAAAGXRFWHRTb2Z0d2FyZQBBZG9iZSBJbWFnZVJlYWR5ccllPAAAAAZQTFRF////MzMzOFSMkQAAAPJJREFUeNrslm0PwjAIhHv//09rYqZADzOBqMnu+WLTruOGvK0lhBBCCPHH4E7x3pwAfFE4tX9lAUBVwZyAYjwFAeikgH3XYxn88nzKbIZly4/BluUlIG66RVXBcYd9TTQWN+1vWUEqIJQI5nqYP6scl84UqUtEoLNMjoqBzFYrt+IF1FOTfGsqIIlcgAbNZ0Uoxtu6igB+tyBgZhCgAZ8KyI46zYQF/LksQC0L3gigdQBhgGkXou1hF1XebKzKXBxaDsjCOu1Q/LA1U+Joelt/9d2QVm9MjmibO2mGTEy2ZyetsbdLgAQIIYQQQoifcRNgAIfGAzQQHmwIAAAAAElFTkSuQmCC'
system = [{"role": "system", "content": "You are a computer vision assistant"}]
user = [{"role": "user", "content": [{"image": example_base64}, "Write poem based on image."]}]
chat = []
while not user[0]['content'] == "exit":
    response = client.chat.completions.create(
        messages = system + chat[-10:] + user,
        model="gpt-4-vision-preview",
        top_p=0.9, stream=True, max_tokens=1536)
    reply = ""
    for delta in response:
        if not delta.choices[0].finish_reason:
            word = delta.choices[0].delta.content or ""
            reply += word
            print(word, end ="")
    chat += user + [{"role": "assistant", "content": reply}]
    user = [{"role": "user", "content": input("\nPrompt: ")}]

Thanks, this is interesting and looks useful. However although I can now easily force other things to timeout using this technique (such as initializing the client) I’m still unable to force a Assistant based on client.beta.assistants.create to timeout during a chat interaction based on client.beta.threads.runs.create.

@Plutes, did you ever get this to work. I, too, have tried the timeout parameter in the .OpenAI call and the client.beta.threads.runs.create_and_poll call, but it never accomplishes anything.

A timeout sent as an API parameter with an openai library is now captured and overrides the client-set timeout used by the network library, either chat completions, or “runs.create” or other methods.

It accepts a seconds parameter.

If you are using a method like create_and_poll, network timeout won’t affect its persistence.

It is refused by the API itself.

Thanks for responding so quickly, but I don’t really understand what you are saying.

I tried

openai.OpenAI(api_key=‘XYZ’, timeout=2.0)

and I tried

run = client.beta.threads.runs.create_and_poll(thread_id=thread_id, assistant_id=assistant_id, additional_messages=[msg], additional_instructions=added_instructs,
truncation_strategy=trunc_strategy, max_completion_tokens=40000, poll_interval_ms=3000, timeout=2.0)

but neither approach did anything, and the run completed after 6 seconds.

Can you explain what I am supposed to do?

The Assistants run is merely kicked-off by the network requests.

It runs autonomously on OpenAI’s servers, calling models, perhaps multiple times.

The only control you have is max_prompt_tokens and max_completion_tokens, limiting your potential expense with termination.

You can supervise your own code, like if you are getting a stream and are “tired of waiting”.

https://platform.openai.com/docs/api-reference/runs/cancelRun

Thanks @_j. My question (and this thread) is about the Python openai library, which includes a timeout parameter in client.beta.threads.runs.create_and_poll that is not being honored.

I understand the situation at OpenAI’s servers, but that’s not really my concern. I want to stop waiting for the run at my end because I’m paying per-instance charges at my own server, and because of UX. The Python open library claims to support that, but apparently it doesn’t work. So I’ll have to implement my own polling loop.

Your expectations obviously don’t align with what the timeout parameter or the library method is meant to do.

The timeout parameter passes a timeout setting to the httpx library, a drop-in for the requests library. It is Python code which is used to make network requests.

Timeout is not going to have the effect you seem to want on create-and-poll, because the polling part is making requests every poll_interval_ms in a loop just to get the status of the run. There is nothing to timeout when the API quickly responds to a “retrieve run” call that the job status is “in progress”, “in progress”, “in progress”…

You can look at the library’s poll() method here, and start hacking, like change the while True directly linked, or have it sleep before it starts polling. :

https://github.com/openai/openai-python/blob/main/src/openai/resources/beta/threads/runs/runs.py#L957

The polling terminates as a signal that the run is finished or reached a terminal state. If the run terminates when the run is in_progress, the whole point of using the specific method is lost.

Thanks for explaining that. You’re right that my expectations of a timeout parameter in a create_and_poll function would be that it would terminate the function if the specified amount of time passed with no resolution. Unfortunately, there’s no documentation of that parameter that I can find.

I do see that the underlying OpenAI API takes an expires_at parameter, which the Python library might have been able to use for real elegance, going a step beyond, but that parameter isn’t really documented either.

Anyway, here is a simplified version of what I ended up doing:

run = client.beta.threads.runs.create(thread_id=th_id, assistant_id=as_id)
while run.status not in Q_COMPLETION_STATES:
if (time.perf_counter() - time_start) > Q_TIMEOUT_SECS:
	raise QuestionTimeout
else:
	time.sleep(Q_POLL_SECS)
	run = client.beta.threads.runs.retrieve(run.id, thread_id=th_id)