Handling Concurrent Streaming Responses with OpenAI Assistant API and FastAPI

Hello everyone,

I’m developing an application using the OpenAI Assistant API with streaming functionality, integrated within a FastAPI server. The goal is to handle multiple users interacting with the assistant simultaneously, each receiving their own streaming responses.

The Problem:

When multiple users interact with the assistant at the same time, I’m encountering an issue where:

  • If a second user sends a query before the first user’s response has finished streaming, the first user’s response gets interrupted.
  • Both users then have to wait until the last response starts streaming before any of the previous ones can continue.
  • This leads to responses being delayed and interfering with each other, which negatively impacts the user experience.
    as anyone experience that before or can advise me on what to be mindful to avoid it ?