Stream Assistant Response

Hi. I’m wondering if it’s possible to stream messages created by Assistants? After perusing the developer docs, it seems like it’s not available yet, and that I’ll have to extract completions from Threads, but wanted to check with the community before doing so.


Listed as one of the limitations on the How Assistants work page

  • Support for streaming output (including Messages and Run Steps).

However, looks like a short-term focus

we are looking to address in the coming weeks and months

So hopefully not long to wait!

See more: Assistants - Limitations


I just noticed that streaming is not available on assistants. For now, I’ll stick to the conversation models for better UI experience. Hopefully streaming will be prioritized soon.


You could fake the streaming if you wanted the features of assistants… though you’d need to wait for the complete response, you could gradually display characters on the UI.

1 Like

I thought about that approach as well. As you mentioned, artificially streaming the response from the Thread would introduce latency that’s undesirable for my use case. I’d also have to work out the logic to stream chunks with varying character counts :sweat_smile:

1 Like

There’s a lot that the Assistants API brings to the table that I’d love to get more in-production early adopter mileage on, but the latency is a real issue, and we don’t have enough idea of what the end result streaming API is going to look like to keep prioritizing it.

If the SDKs faked streaming…or (more likely) if we in the developer community had some hints at roughly what the interface was likely to look like, then we COULD build a ‘fake’ layer over the existing API, that would let us write and test a lot more of the code than we have so far.

Currently our production version is not using the Assistants API, due to the latency with non-streaming implementation. We DO have a non-streaming Assistants variety in production, implemented behind a feature flag, but it’s highly incomplete, and our team’s now prioritizing other work. Due to the nature of the Assistant’s Threads/Messages Run polling model, the team anticipates that when the streaming API is released, it’s likely to be significantly more involved than adding ‘stream=true’ and getting back a stream – and so we’ll have to rework much of what we’ve already done.

Our early implementation is just kind of sitting there right now, and unlike a fine wine, when the streaming API DOES come out, I am worried our early implementation is going to turn to vinegar.

On the other hand, with a roughly compliant simulated layer, when the “official” Assistants beta streaming implementation comes out, theoretically we delete a few lines of code, fix a few bugs, the latency magically goes away, we flip the feature flag, and profit.

I know that’s wishful thinking, and while I’m at it, it’d be really awesome if that was all just built into the OpenAI Node library FOR us. That said, if we had some confidence in what the streaming API might look like when it comes out, we’d probably reprioritize some work and build that ourselves.

TL;DR: OpenAI doesn’t have to actually publish Assistants API streaming to help us keep moving – they could fake it, or give us some guidance that would let us fake it ourselves for a while.


Same thoughts as @TimJohns.

But since we’re starting our implementation now, we can see a huge advantage to go with Assistant API over chat completions.

We’ve decided to start with Assistant API over a streaming channel. So that when their streaming is ready we can just “adapt” our logic and we will have the railway ready between our app and our backend.

BUT, that been said, Latency is a real issue for us! Specially because we need to convert text-to-speech afterwards, so, after some tests, if we see that we’re still too behind because of the latency, we might divert back to chat completions until the Assistant API supports streaming…

Hopefully OpenAI will role this out soon.


Hopefully OpenAI will role this out soon.

@contactmewk do you know if there are any updates on implementing streaming for the Assistant API?


I had to actually move off the assistant api and onto llama-index to finally get streaming for my GPT. It was totally worth it and i’ll continue to do that until open ai gets around to building it into their sdks.

Hi, with the llama index solution, can new data be added dynamically if chatgpt learns something new about the user for example?

I would be interested to understand how have you replaced the AI Assistant ? Have you gone to chat completion? I’m not sure how llama-index has helped you, perhaps your use case is very different than mine


It’s pretty bad at the moment to be honest. Polling their API and see in the queue for the first 20s is just madness.

Can’t understand how this went to PROD.

I’m now considering going back to chat completion. Problem is having to manage state + the quality is VERY inferior (at least on my case and research).

1 Like

A post was merged into an existing topic: Streaming is now available in the Assistants API!