Assistants latency too slow for production

The assistants API has been really helpful for getting our development started quickly. We would not have been able to get an MVP this fast without it. Unfortunately the time-to-first-token is just too slow. We are seeing around 1500 - 2500 ms of added latency compared to equivalent requests to the chat API. I’m really hoping this improves by the time assistants comes out of beta. As it stands we’ll have to switch to using the chat API and dealing with persistent threads and code interpreter ourselves.

Just wanted to give my honest feedback. It’s a great API. It just needs to be faster.

1 Like

This has been a problem for a long time. I have posted about this same thing. As you said assistants is in “beta” and I doubt it will ever leave it. It is not ready for production use at all.

2 Likes

it’ll never be as fast as chat completions, it has a context / history of messages to process for every response.