Long response times for Python method: client.beta.threads.runs.retrieve()

Background

Hey everyone

We’ve been chasing some 408 Request Timeout errors when our Python application, deployed on Azure Functions, uses the OpenAI Assistants API. This problem leads to significant delays in response times, notably around 10 minutes for what should typically be immediate responses.

Below is a simplified pseudocode that captures the essence of the operation causing the timeout errors:

    while True:
        try:
            # Fetch the status of assistant run
            logging.info("Fetching run status of assistant...")
            assistant_run = client.beta.threads.runs.retrieve(
                thread_id=thread.id,
                run_id=assistant_run.id
            )
            status = assistant_run.status
            logging.info(f"Assistant run status: {status}")
        except Exception as e:
            logging.error("An error occurred.")

Note: The actual implementation involves more complex error handling and logging, but this snippet represents the core logic where delays and timeouts are observed.

Additional Context

Our investigation suggests that the issue primarily revolves around the client.beta.threads.runs.retrieve() method. This method is intended to fetch the status of a a specific assistant run.

Interestingly, its performance is inconsistent: at times, it executes as expected, promptly returning results. However, on other occasions, it exhibits significant delays, taking an unusually long time to complete. This erratic behavior is at the core of our troubleshooting efforts, as understanding the conditions under which the delays occur could be key to resolving the 408 Request Timeout errors.

Example of the resulting logs showing the delay. Note the timestamp difference between line #1 and #2;

[2024-02-29T22:05:15.259Z] Fetching run status of assistant...
[2024-02-29T22:15:15.261Z] [channel] received <anonymized_id>: RpcLog
[2024-02-29T22:15:15.263Z] Retrying request to /threads/<anonymized_thread_id>/runs/<anonymized_run_id> in 0.758969 seconds
[2024-02-29T22:15:16.294Z] [channel] received <anonymized_id>: RpcLog
[2024-02-29T22:15:16.297Z] HTTP Request: GET https://api.openai.com/v1/threads/<anonymized_thread_id>/runs/<anonymized_run_id> "HTTP/1.1 200 OK"

Looking to find out if anyone else has seen such an issue, or if you have any insight on how I might be able to understand what’s causing the delay, and maybe account for it in my python app.

Here’s plausible AI-written code to be persistent and rude to the API about its need for polling, which I haven’t expended more effort on after loading my python specialist up with documentation and the methods I want it to use, and giving more ignored specifications again.


This code creates two main coroutines: status_check, which sends status check requests every second, and process_responses, which processes the incoming statuses from an asyncio.Queue. The process_responses coroutine monitors the queue for a status indicating the run has completed or encountered an error, at which point it cancels all outstanding tasks, including further status checks, and proceeds to the next step in your application.

To achieve the goal of continuously sending status check requests without waiting for prior requests to complete and handling responses as they arrive, we can utilize asyncio to create a non-blocking loop that fires off status check tasks at a regular interval (e.g., every second). Each task will run independently, and we’ll use an asyncio.Queue to handle the responses.

We’ll define a function that continuously sends status check requests and another function to process the responses. Once a response with a status indicating the run is no longer ‘queued’ or ‘in_progress’ is received, we’ll cancel all outstanding tasks and proceed.

To adapt to the requirement that polling should continue until a status other than queued or in_progress is received, we can modify the approach slightly. The main change involves the process_responses function, which now will stop polling only when a status different from queued or in_progress is detected.

import asyncio
import os
from openai import AsyncOpenAI

client = AsyncOpenAI(api_key=os.environ.get("OPENAI_API_KEY"))

async def status_check(thread_id, run_id, queue):
    while True:
        try:
            # Fetch the status of the assistant run
            assistant_run = await client.beta.threads.runs.retrieve(
                thread_id=thread_id,
                run_id=run_id
            )
            status = assistant_run.status
            await queue.put(status)  # Put the status into the queue for processing
        except Exception as e:
            print(f"An error occurred while fetching the status: {e}")
        
        await asyncio.sleep(1)  # Wait for 1 second before the next status check

async def process_responses(queue, tasks):
    non_final_statuses = ['queued', 'in_progress']
    while True:
        status = await queue.get()
        print(f"Received status: {status}")
        if status not in non_final_statuses:
            # Cancel all outstanding tasks
            for task in tasks:
                task.cancel()
            print(f"Final status received: {status}. Proceeding to the next step.")
            break

async def main():
    thread_id = 'your_thread_id_here'
    run_id = 'your_run_id_here'
    queue = asyncio.Queue()

    # Create a list to keep track of tasks
    tasks = []

    # Start the status check loop
    status_task = asyncio.create_task(status_check(thread_id, run_id, queue))
    tasks.append(status_task)

    # Start the response processing task
    process_task = asyncio.create_task(process_responses(queue, tasks))
    tasks.append(process_task)

    # Wait for all tasks to complete
    await asyncio.gather(*tasks, return_exceptions=True)

asyncio.run(main())

This code will continuously send status check requests every second. The process_responses coroutine monitors the queue for statuses. If a status other than queued or in_progress is received, it cancels all tasks, effectively stopping the polling. This setup ensures that the function doesn’t wait for a previous status check to complete before sending the next one, and it can handle responses as they arrive asynchronously.

Thanks for the code @_j Async functions has been on my list of things to tackle. However, in this case I don’t believe an async function would help much. I believe the root cause has been due to recent service outages that OpenAI have since announced.