I want to give my users the feeling that the assistant works in “zero-latency”. I want the user to receive a friendly response from the assistant immediately after asking a question - before any tools are used! Some of the tools are functions that could take a very long time to return, and I don’t want to keep the user waiting so long for the first response.
BTW I’m streaming by fetching the thread steps in a loop and extracting all of the “text” typed messages.
1 Like
Just use javascript to throw in a “working on this” type of response and then replace it with the real response when you get it. At this point though the assistant api is so slow and inconsistent you probably should not be using it in any sort of production scenario
1 Like
Do you know a better wrapper that could be used like an assistant and is more production ready?
1 Like
If you are not dependent on openai for the system. You can run Langchain as an interlocutor using an LLM like LLama. You can train LLama to direct Openai for challenging tasks.
1 Like
Agree with the other comment about maybe using some other platform. For me, i use the chat completions since I don’t need code interpreter, and any file related things I can just vectorize and just match against them. I think in theory the assistant api has lots of potential but just isn’t up to snuff right now. We will see how things look in the new year, as this tech is moving quite fast.
2 Likes