Is using threads to call my asynchronous OpenAI assistant endpoint in FastAPI the right approach?

umairalam567 · December 12, 2024, 6:49am

Hi everyone,

I’m working on a FastAPI server that calls an OpenAI assistant via asynchronous endpoints. My current approach is to run something like this inside a function (let’s say get_response):


attempts = 1
while attempts <= max_attempts:
    thread = await client.beta.threads.create()
    prompt = sample_prompt + output_format

    message = await client.beta.threads.messages.create(
        thread_id=thread.id,
        role="user",
        content=prompt
    )

    run = await client.beta.threads.runs.create(
        thread_id=thread.id,
        assistant_id=os.getenv("OPENAI_ASSISTANT_ID")
    )

    # Poll until the run completes
    while run.status not in ["completed", "failed"]:
        run = await client.beta.threads.runs.retrieve(thread_id=thread.id, run_id=run.id)
        await asyncio.sleep(2)

    if run.status == "failed":
        # Handle failure
        raise Exception("Assistant run failed")

    # Retrieve messages and usage
    completed_run = await client.beta.threads.runs.retrieve(thread_id=thread.id, run_id=run.id)
    message_response = await client.beta.threads.messages.list(thread_id=thread.id)
    messages = message_response.data
    # Do something with `messages`

I then call get_response() in different threads simultaneously, with potentially up to 5 threads at once.

My questions are:

Is this a good approach for scaling to handle hundreds of users? I’m mixing asynchronous calls (await client.beta.threads…) with multiple threads. Is it best practice to use threads here, or should I rely solely on async event loops and avoid explicit threading?

Would increasing the number of event loop tasks (e.g., multiple asyncio tasks) or relying on the server’s concurrency model (such as multiple workers from Uvicorn/Gunicorn) be a better approach?

What patterns or architectures do people recommend when calling OpenAI assistants (or similar APIs) at scale via an async web framework like FastAPI?

I’m a bit new to this and just want to ensure I’m setting things up in a way that will scale well and not cause hidden issues (like blocking I/O or unnecessary overhead).

Any advice, best practices, or insights would be greatly appreciated! Thank you.

Topic		Replies	Views
High User Volume with Async OpenAI API assistants-api	0	1413	December 4, 2023
Handling Concurrent Streaming Responses with OpenAI Assistant API and FastAPI API assistants-api	0	323	October 22, 2024
Questions on Creating 100K+ threads API threads	6	966	March 18, 2024
Can one assistant run concurrently on multiple different threads at the same time? API	3	3732	March 12, 2024
Using Asynchronous Client with AsyncOpenAI API api , assistants-api	12	73389	October 8, 2024

Is using threads to call my asynchronous OpenAI assistant endpoint in FastAPI the right approach?

Related topics