Background
Hey everyone
We’ve been chasing some 408 Request Timeout errors when our Python application, deployed on Azure Functions, uses the OpenAI Assistants API. This problem leads to significant delays in response times, notably around 10 minutes for what should typically be immediate responses.
Below is a simplified pseudocode that captures the essence of the operation causing the timeout errors:
while True:
try:
# Fetch the status of assistant run
logging.info("Fetching run status of assistant...")
assistant_run = client.beta.threads.runs.retrieve(
thread_id=thread.id,
run_id=assistant_run.id
)
status = assistant_run.status
logging.info(f"Assistant run status: {status}")
except Exception as e:
logging.error("An error occurred.")
Note: The actual implementation involves more complex error handling and logging, but this snippet represents the core logic where delays and timeouts are observed.
Additional Context
Our investigation suggests that the issue primarily revolves around the client.beta.threads.runs.retrieve()
method. This method is intended to fetch the status of a a specific assistant run.
Interestingly, its performance is inconsistent: at times, it executes as expected, promptly returning results. However, on other occasions, it exhibits significant delays, taking an unusually long time to complete. This erratic behavior is at the core of our troubleshooting efforts, as understanding the conditions under which the delays occur could be key to resolving the 408 Request Timeout errors.
Example of the resulting logs showing the delay. Note the timestamp difference between line #1 and #2;
[2024-02-29T22:05:15.259Z] Fetching run status of assistant...
[2024-02-29T22:15:15.261Z] [channel] received <anonymized_id>: RpcLog
[2024-02-29T22:15:15.263Z] Retrying request to /threads/<anonymized_thread_id>/runs/<anonymized_run_id> in 0.758969 seconds
[2024-02-29T22:15:16.294Z] [channel] received <anonymized_id>: RpcLog
[2024-02-29T22:15:16.297Z] HTTP Request: GET https://api.openai.com/v1/threads/<anonymized_thread_id>/runs/<anonymized_run_id> "HTTP/1.1 200 OK"
Looking to find out if anyone else has seen such an issue, or if you have any insight on how I might be able to understand what’s causing the delay, and maybe account for it in my python app.