Assistant API - Speed and Token Limit

beyhangl · April 30, 2024, 10:33pm

Hello everyone,

In the API, we want each message to be shorter, but we can limit the total run in terms of tokens. In real-world use, API requests can sometimes take 15-20 seconds, and users will be unhappy with this.

If the token limit per message won’t be added, then I think the speed necessarily needs to be increased. Will there be a significant increase in speed after the beta version? If so, when will this development occur?

_j · April 30, 2024, 10:42pm

Assistants takes multple API calls to set into motion, and you are inherently waiting on an AI that can perform multiple internal steps before responding.

You can use the streaming output mode to get the text as it is being produced when the AI finally responds.

If you want quick responses, you’d use the chat completions endpoint, and do quick and automatic RAG to give knowledge instead of having an AI that can search documents. Streaming there can start writing in under a second.

beyhangl · May 1, 2024, 8:16am

I have to use the assistant API because I utilize the function calling feature; otherwise, I know that streaming is available with the completion. However, I fear that streaming is not applicable for the assistant API. Has such a feature been introduced in the latest updates that I might have missed?

Topic		Replies	Views
Timeline for Assistants API Features? API assistants-api	3	1265	February 10, 2024
Why Assistants API is Slow? Any speed solution? API api-speed , openai , rag , assistants-api	15	8390	September 10, 2024
Assistants API Performance API api , assistants-api	11	2804	March 21, 2024
How do you stream assistants API responses? API assistants-api	4	2802	January 9, 2024
When will the response time/timeout issue be addressed? API gpt-4	1	1382	November 2, 2023

Assistant API - Speed and Token Limit

Related topics