Long response times for Python method: client.beta.threads.runs.retrieve()

DigitalDoge · February 29, 2024, 11:10pm

Background

Hey everyone

We’ve been chasing some 408 Request Timeout errors when our Python application, deployed on Azure Functions, uses the OpenAI Assistants API. This problem leads to significant delays in response times, notably around 10 minutes for what should typically be immediate responses.

Below is a simplified pseudocode that captures the essence of the operation causing the timeout errors:

    while True:
        try:
            # Fetch the status of assistant run
            logging.info("Fetching run status of assistant...")
            assistant_run = client.beta.threads.runs.retrieve(
                thread_id=thread.id,
                run_id=assistant_run.id
            )
            status = assistant_run.status
            logging.info(f"Assistant run status: {status}")
        except Exception as e:
            logging.error("An error occurred.")

Note: The actual implementation involves more complex error handling and logging, but this snippet represents the core logic where delays and timeouts are observed.

Additional Context

Our investigation suggests that the issue primarily revolves around the client.beta.threads.runs.retrieve() method. This method is intended to fetch the status of a a specific assistant run.

Interestingly, its performance is inconsistent: at times, it executes as expected, promptly returning results. However, on other occasions, it exhibits significant delays, taking an unusually long time to complete. This erratic behavior is at the core of our troubleshooting efforts, as understanding the conditions under which the delays occur could be key to resolving the 408 Request Timeout errors.

Example of the resulting logs showing the delay. Note the timestamp difference between line #1 and #2;

[2024-02-29T22:05:15.259Z] Fetching run status of assistant...
[2024-02-29T22:15:15.261Z] [channel] received <anonymized_id>: RpcLog
[2024-02-29T22:15:15.263Z] Retrying request to /threads/<anonymized_thread_id>/runs/<anonymized_run_id> in 0.758969 seconds
[2024-02-29T22:15:16.294Z] [channel] received <anonymized_id>: RpcLog
[2024-02-29T22:15:16.297Z] HTTP Request: GET https://api.openai.com/v1/threads/<anonymized_thread_id>/runs/<anonymized_run_id> "HTTP/1.1 200 OK"

Looking to find out if anyone else has seen such an issue, or if you have any insight on how I might be able to understand what’s causing the delay, and maybe account for it in my python app.

_j · February 29, 2024, 11:35pm

Here’s plausible AI-written code to be persistent and rude to the API about its need for polling, which I haven’t expended more effort on after loading my python specialist up with documentation and the methods I want it to use, and giving more ignored specifications again.

This code creates two main coroutines: status_check, which sends status check requests every second, and process_responses, which processes the incoming statuses from an asyncio.Queue. The process_responses coroutine monitors the queue for a status indicating the run has completed or encountered an error, at which point it cancels all outstanding tasks, including further status checks, and proceeds to the next step in your application.

To achieve the goal of continuously sending status check requests without waiting for prior requests to complete and handling responses as they arrive, we can utilize asyncio to create a non-blocking loop that fires off status check tasks at a regular interval (e.g., every second). Each task will run independently, and we’ll use an asyncio.Queue to handle the responses.

We’ll define a function that continuously sends status check requests and another function to process the responses. Once a response with a status indicating the run is no longer ‘queued’ or ‘in_progress’ is received, we’ll cancel all outstanding tasks and proceed.

To adapt to the requirement that polling should continue until a status other than queued or in_progress is received, we can modify the approach slightly. The main change involves the process_responses function, which now will stop polling only when a status different from queued or in_progress is detected.

import asyncio
import os
from openai import AsyncOpenAI

client = AsyncOpenAI(api_key=os.environ.get("OPENAI_API_KEY"))

async def status_check(thread_id, run_id, queue):
    while True:
        try:
            # Fetch the status of the assistant run
            assistant_run = await client.beta.threads.runs.retrieve(
                thread_id=thread_id,
                run_id=run_id
            )
            status = assistant_run.status
            await queue.put(status)  # Put the status into the queue for processing
        except Exception as e:
            print(f"An error occurred while fetching the status: {e}")
        
        await asyncio.sleep(1)  # Wait for 1 second before the next status check

async def process_responses(queue, tasks):
    non_final_statuses = ['queued', 'in_progress']
    while True:
        status = await queue.get()
        print(f"Received status: {status}")
        if status not in non_final_statuses:
            # Cancel all outstanding tasks
            for task in tasks:
                task.cancel()
            print(f"Final status received: {status}. Proceeding to the next step.")
            break

async def main():
    thread_id = 'your_thread_id_here'
    run_id = 'your_run_id_here'
    queue = asyncio.Queue()

    # Create a list to keep track of tasks
    tasks = []

    # Start the status check loop
    status_task = asyncio.create_task(status_check(thread_id, run_id, queue))
    tasks.append(status_task)

    # Start the response processing task
    process_task = asyncio.create_task(process_responses(queue, tasks))
    tasks.append(process_task)

    # Wait for all tasks to complete
    await asyncio.gather(*tasks, return_exceptions=True)

asyncio.run(main())

This code will continuously send status check requests every second. The process_responses coroutine monitors the queue for statuses. If a status other than queued or in_progress is received, it cancels all tasks, effectively stopping the polling. This setup ensures that the function doesn’t wait for a previous status check to complete before sending the next one, and it can handle responses as they arrive asynchronously.

DigitalDoge · March 4, 2024, 11:29pm

Thanks for the code @_j Async functions has been on my list of things to tackle. However, in this case I don’t believe an async function would help much. I believe the root cause has been due to recent service outages that OpenAI have since announced.

Topic		Replies	Views
[Critical] Over 25% Assistant API Request Timeout Randomly API	81	5264	March 18, 2024
Has the Assistant API become slow even for fetching run data? API assistants	5	1442	January 31, 2024
OpenAI thread run API incredibly slow API gpt-4-turbo , threads , assistants-api	2	344	August 4, 2024
Create Message and create Run Assistant take too long time API assistants-api	0	45	October 22, 2024
Minimum time between .create() and .retrieve() calling assistant over API? API assistants-api	3	2664	March 1, 2024

Long response times for Python method: client.beta.threads.runs.retrieve()

Background

Additional Context

Related topics