Streaming is now available in the Assistants API!

In case you missed it, OpenAI staff dropped by today with a pretty cool announcement…

Check out the assistant API streaming docs

13 Likes

This came out two hours ago; evaluating now…

3 Likes

This looks promising; evaluating now…

2 Likes

Any thoughts on how to use textCreated & textDelta from official example in my client-side bubble.tsx ?

const run = openai.beta.threads.runs.createAndStream(thread.id, {
    assistant_id: assistant.id
  })
    .on('textCreated', (text) => process.stdout.write('\nassistant > '))
    .on('textDelta', (textDelta, snapshot) => process.stdout.write(textDelta.value))

Unable to handle it in my nextjs web app. Error is:

⨯ unhandledRejection: OpenAIError: Cannot read properties of undefined (reading 'write')
    at eval (webpack-internal:///(rsc)/./node_modules/openai/lib/AbstractAssistantStreamRunner.mjs:46:37)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5) {
  cause: TypeError: Cannot read properties of undefined (reading 'write')
      at eval (webpack-internal:///(rsc)/./app/api/chat/route.ts:36:53)
      at eval (webpack-internal:///(rsc)/./node_modules/openai/lib/AbstractAssistantStreamRunner.mjs:169:47)
      at Array.forEach (<anonymous>)
      at AssistantStream._emit (webpack-internal:///(rsc)/./node_modules/openai/lib/AbstractAssistantStreamRunner.mjs:169:23)
      at AssistantStream._AssistantStream_handleMessage (webpack-internal:///(rsc)/./node_modules/openai/lib/AssistantStream.mjs:333:18)
      at AssistantStream._AssistantStream_addEvent (webpack-internal:///(rsc)/./node_modules/openai/lib/AssistantStream.mjs:314:107)
      at AssistantStream._createAssistantStream (webpack-internal:///(rsc)/./node_modules/openai/lib/AssistantStream.mjs:239:102)
      at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
      at async AssistantStream._runAssistantStream (webpack-internal:///(rsc)/./node_modules/openai/lib/AbstractAssistantStreamRunner.mjs:202:16)
}
 ⨯ unhandledRejection: OpenAIError: Cannot read properties of undefined (reading 'write')
    at eval (webpack-internal:///(rsc)/./node_modules/openai/lib/AbstractAssistantStreamRunner.mjs:46:37)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5) {
  cause: TypeError: Cannot read properties of undefined (reading 'write')
      at eval (webpack-internal:///(rsc)/./app/api/chat/route.ts:36:53)
      at eval (webpack-internal:///(rsc)/./node_modules/openai/lib/AbstractAssistantStreamRunner.mjs:169:47)
      at Array.forEach (<anonymous>)
      at AssistantStream._emit (webpack-internal:///(rsc)/./node_modules/openai/lib/AbstractAssistantStreamRunner.mjs:169:23)
      at AssistantStream._AssistantStream_handleMessage (webpack-internal:///(rsc)/./node_modules/openai/lib/AssistantStream.mjs:333:18)
      at AssistantStream._AssistantStream_addEvent (webpack-internal:///(rsc)/./node_modules/openai/lib/AssistantStream.mjs:314:107)
      at AssistantStream._createAssistantStream (webpack-internal:///(rsc)/./node_modules/openai/lib/AssistantStream.mjs:239:102)
      at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
      at async AssistantStream._runAssistantStream (webpack-internal:///(rsc)/./node_modules/openai/lib/AbstractAssistantStreamRunner.mjs:202:16)
}

Existing code I am trying to refactor is:

    import OpenAI from "openai";
    import { OpenAIStream, StreamingTextResponse } from "ai";
    
    ....

 
        let run = await openai.beta.threads.runs.create(thread.id, {
          assistant_id: assistant.id,
        });
    
        while (run.status === "in_progress" || run.status === "queued") {
          await new Promise((resolve) => setTimeout(resolve, 500));
          run = await openai.beta.threads.runs.retrieve(thread.id, run.id);
        }
    
        const messages1 = await openai.beta.threads.messages.list(thread.id);
    
        console.log(messages1.data);
    
        let responseContent = "";
        if (messages1.data[0].content[0]?.type === "text") {
          responseContent = messages1.data[0].content[0]?.text.value;
        }
    
        console.log(responseContent);
    
        return new NextResponse(
          JSON.stringify({
            content: responseContent,
            data: {
              threadId: thread.id,
              newThread: newThread,
            },
          })
        );
      } catch (e) {
        throw e;
      }
    }

useEffect in bubble.tsx code where I think I should catch it.

useEffect(() => {
    if (content.role === "assistant") {
      if (content.processing) {
        // Reset displayed content when processing starts
        setDisplayedContent("");
        return;
      }

      if (content?.content && !isLoading) {
        // Split the message into characters and display one by one
        let index = 0;
        const interval = setInterval(() => {
          setDisplayedContent((prev) => prev + content.content.charAt(index));
          index++;
          if (index === content.content.length) {
            clearInterval(interval);
          }
        }, 5);

        return () => clearInterval(interval);
      }
    } else {
      setDisplayedContent(content.content);
    }
  }, [content, isLoading]);
1 Like

Great. I’d like to handle thread.run.requires_action in a streaming mode. .on('toolCallDone', ...) could work, but it seems too ambiguous.

1 Like

My app is in Python using Flask, to use the stream will I need to refactor it to be asynchronous or can I use it in my application?

1 Like

Both python openai library and curl thread run create for new streaming feature functioned…
But it’s very hard to adapt previous chat completion streaming code to new interface.

For safety, we don’t stream api to front end directly, so an api gateway streaming directly to openai api and bridge the streaming from api to front end framework, to prevent leak of apikey.
If python lib can still use “for chunk in stream_resp” like implementation, it may be a little easier.
Or openai may provide other streaming mechanism, such as get a one time token for this streaming process, and front end can directly streaming from openai without the risk of passing apikey and save some network bandwidth, and allow callback or retrieve completed stream run status for backend api endpoint.

But, chat completion endpoint stills seems to be more flexible and cost efficient for now, wait it to be more ready for production use.

1 Like

its out finally. I posted a video on my youtube about it. there is still some challenges
@CustomGPT.AIAcademy

2 Likes

:+1: LGTM. All tests pass. Currently rolling out in Prod. All canaries currently in the green.

It’s Miller Time™.*

  • More accurately, it’s time for me to head over to the Side Hustle Taproom in Kirkland, for a celebratory Bellevue Brewing Company Tangerine Pale Ale before I dive back into the stuff I was working on yesterday when this update rolled out. If by some serendipitous chance of fate anyone local actually sees this and runs into me out there tonight, say hello and I’ll buy up to the first five of you a beverage of your choice. Look for the guy in the black Carhartt hoodie with the MacBook Pro with a few road-trip themed stickers on the lid.
3 Likes

Using Python SDK, streaming text works ok, also tool calls (using functions) events happen and functions can be called, however I do not see how to get an answer from assistant after tool calls are submitted. I would expect the streaming to continue after tool calls are submitted and text would be streamed.

3 Likes

This API looks like JS from the 2000

OpenAI Assistants API will not pass the test of time

1 Like

I have written a blog on this - OpenAI Assistant’s Streaming Support | by Bikram kachari | Mar, 2024 | Medium

2 Likes
2 Likes

OpenAI examples did not cover basically at all, how to submit tools calls and stream the answer after the function call. Eventually I used the same AssistantEventHandler for both “create_and_stream” and “submit_tool_outputs_stream” and that works.

4 Likes

Agree. I struggle to understand how to handle certain events in Node.js. In this part of the docs, they say we need to handle textCreated/toolCallCreated etc. At the same time, the complete list of events here doesn’t correspond to the code (node.js) by naming notation. I can’t understand how to translate thread.message.created into textCreated myself and how to find appropriate events for node.js that I need to handle.
Also it is impossible to understand how to handle function calls from the assistant, from the docs, using streaming. I logged all the events to understand what is going on under the hood, but I still don’t get which event should I handle to get the function name with arguments. Is it thread.run.requires_action? This is the only event that contains full info for running my func, but the event name sound weird for this purpose.

4 Likes

Just my opinion, but I think about half of that (the event naming convention and missing details) is documentation that OpenAI could do a better job with, and the other half (the part @louis030195 said looks like 2000’s JavaScript) is Node’s streaming and event listener architecture.

I anticipate that OpenAI will fill in the much-needed examples and additional documentation, shortly. Alternately, as @kachari.bikram42 did (thank you), we’ll see folks in the community “fill in the blanks”. On the other part, I suspect the streaming and event architecture will be a harder problem for a while.

In theory, OpenAI could come up with a much easier to use architecture than Node/JS’s, but my own recent experience is that’s easier said than done. On Epic Road Trip Planner, we use a back-end layer and our own API abstractions in an attempt to ‘simplify’ the interface for our front-end code (as well as to more importantly secure our API key), but it turns out that even with our own custom API abstraction, the resulting front-end JavaScript code is definitely still not as clean as we’d like and, indeed, looks quite 2000s-ish as well. (Don’t worry, I’m not throwing anyone else on the team here under the bus - I personally wrote it, so I get to complain about it as much as I want - hah!).

1 Like

I’m struggling with how to submit the tool outputs, and when (on which listener). Could you elaborate a little? I don’t quite understand the particulars of your comment.

The submit tool outputs is done from “on_tool_call_done” event handler when required_action.type is “submit_tool_outputs”.

Python implementation for that I have is here: azureai-assistant-tool/sdk/azure-ai-assistant/azure/ai/assistant/management/assistant_client.py at main · Azure-Samples/azureai-assistant-tool · GitHub

Now I am struggling how to implement function calling in streaming assistant run in my NestJS project.
As you know, The example code is like this.

const run = openai.beta.threads.runs.createAndStream(thread.id, {
    assistant_id: assistant.id
  })
    .on('textCreated', (text) => process.stdout.write('\nassistant > '))
    .on('textDelta', (textDelta, snapshot) => process.stdout.write(textDelta.value))
    .on('toolCallCreated', (toolCall) => process.stdout.write(`\nassistant > ${toolCall.type}\n\n`))
    .on('toolCallDelta', (toolCallDelta, snapshot) => {
      if (toolCallDelta.type === 'code_interpreter') {
        if (toolCallDelta.code_interpreter.input) {
          process.stdout.write(toolCallDelta.code_interpreter.input);
        }
        if (toolCallDelta.code_interpreter.outputs) {
          process.stdout.write("\noutput >\n");
          toolCallDelta.code_interpreter.outputs.forEach(output => {
            if (output.type === "logs") {
              process.stdout.write(`\n${output.logs}\n`);
            }
          });
        }
      }
    });

I would like to know about event handler of nodejs SDK.
Is there anyone to know about that? especially function calling of streaming assistant?

1 Like

I’m in the same boat (nodejs). I had non-streaming function calls working, and was able to get streaming working for regular text responses, but now I’m stuck on function calling. I’m accumulating the argument chunks, and adding them to the object returned on the listener toolCallDone which looks like below. I assume it’s producing the object so we can use it.

{
  index: 0,
    id: 'call_I5q5dyCFtmkakKXkJACD8cBX',
      type: 'function',
        function: {
          name: 'my_function',
          arguments: '{ my_accumulated_args }',
          output: null
        }
}

Then I’m trying to run the function calls which should work, but I am a bit confused on how to run .submitToolOutputs, since before openai.beta.threads.runs.retrieve(threadId, runId) returned the threadId and runId used, but openai.beta.threads.runs.createAndStream does not seem to. I’m also unsure of how to capture requires_action like before. This change in logic is mainly what’s tripping me up. I know I’m missing something here, but it’s hard to really tell from the sparse examples. I think one complete example laying out the logic would be helpful.