Streaming is now available in the Assistants API!

I am pretty much experiencing the same confusion. But, requires_action I think should be the event where you trigger the functions, since that is what it is for non-streaming. Currently I just can’t figure out how to capture requires_action…how were you able to? Like you said, the names for nodejs are not align.

Oh man this is awesome! I can’t wait to add support for it !

I can not for the life of me figure out how to resolve my import issues in the python library, I just keep getting cannot import name ‘AssistantEventHandler’ from ‘openai’ or no openai.beta and stuff like that. I am using python 3.9 if that is relevant, hopefully this is an OK thread to ask this. All of the examples from the documentation produce errors, and I have ensured I have updated the library. Any help would be so, so awesome. Thanks!

Did you upgrade openai library >= 1.14.0 version?

Works great! Already running this live, and it’s just as fast as the ChatCompletions route. Now we just need “temperature” and I’m happy. Without temperature settings I notice that the replies I get are just to creative for my usecases. I can’t prompt it out, so temperature is neccesary.

@kachari.bikram42 Have you noticed any improvement on the total response time using streaming? I cannot see using any apps (even internally for evaluation) given the current performance.
I see the same performance issue with azureai version of the assistant.

This is how I am handling it.

with client.beta.threads.runs.create_and_stream(
        thread_id = thread_id,
        assistant_id =,
    ) as stream:
        for event in stream:
            if event.event == '':
                if == 'tool_calls':
                    print('\nTool calls detected..')
                    final_run = stream.get_final_run()
                    yield from requires_action(final_run, query)
                    print('\nMessage creation detected...')
                    for text in stream.text_deltas:
                        yield f"data: {json.dumps({'text': text})}\n\n"
            elif event.event == '':
                yield f"data: {json.dumps({'text':[0].text.value})}\n\n"

Any plan to add support for this for Azure OpenAI. Thanks!

1 Like

I haven’t observed any significant improvement . With streaming it’s just that in the UI, a user now doesn’t have to wait for the entire response to be displayed. It helped me improve the user experience.

1 Like

I just wrote an example on medium here

Hi! I’m undergraduate student in South Korea. Thanks for sharing your code.
I have a few questions about your code.

  1. What’s the meaning of requires_action function and query parameter?
    I’ve tried your code but tool_calls ouput is not yielded.

  2. How can I get code interpreter outputs?

@kachari.bikram42 Thanks for the feedback. I will give the streaming a try when I get a chance.

Tutorial on how to implement the response streaming functionality in Python and Node.js

I built a terminal user interface to be able to chat with a customer support chatbot in the past (see the YouTube tutorial). Today, I created a new YouTube tutorial and added the response streaming functionality.

There is an example for both Python and Node.js. See my GitHub repository with full code for the tutorial.


1 Like

Hey @waseem_gul - any ideas how one can point the streaming to a client side chat interface?

We just added support for temperature! Hope it works well for your use-case.


Woah nice. I was like, I didn’t see this on the changelog:

But then I looked at your comment timestamp haha

1 Like

Hi @rokbenko thanks for sharing this code - have you managed to get this to a nextjs (or other) frontend ui? I am currently only able to get the response returned to my chat interface (nextjs) via websockets, only once i get the answer completed in the backend (python/fastapi). i am testing both nextjs and dash as frontends, and i am possibly missing something really simple to get the real-time stream to my frontend… thanks

1 Like

Hey hq1 I ran into your comment when I was trying to find an answer myself a few days ago. The things I had to do before it worked properly for me were 1. Make everything Async 2. correctly import AsyncOpenAI and AsyncAssistantEventHandler rather than their synchronous alternatives. 3. Switch from Flask to FastAPI to use WebSockets instead of endpoints and send each on_message_delta through the socket to be displayed.

Hopefully you just missed one of these steps and can fix it real quick. I’m using python for my backend and just pure js in my frontend.

1 Like

Flask with eventlet.sleep(0) works to stream.

In my Python Flask application, would I have to refactor it to incorporate asynchronous functionality in order to utilize the stream, or can I integrate it directly into my existing setup?