Super slow Assistants API responses

Hello everytime I speak the api, it is taking super long to respond, it is taking up to like 30 seconds

Assistants should not be the primary way that you interact with AI models.

Chat Completions + streaming = get the output starting within seconds.

Assistants: Multiple API calls to set something in motion, and then more waiting to find out when it’s done.

what do I do instead, I need to have an Ai trained on data and can respond like an Assistant, is there another solution?

“assistants” is just a poor confusing name for an agent framework that can make multiple calls internally. You aren’t “training an AI on data”, it is getting data injected or data searchable with internal functions.

so adapt to:

Chat Completions + retrieval-augmented generation + streaming = get the output starting within seconds.

where RAG is employing a semantic search vector database to inject the AI context with relevant results for the user input before the main AI even generates a token.

You also can evaluate the production rate of tokens on individual models. gpt-3.5-turbo-0125, for example, being new, was making 100 tokens per second.

yes but I like the thread management function.
I don’t want to code many functions to fetch chat history + summarize + fetching database storage for the history file.

is there a video on youtube about this? I would love to explore

Also I heard that if I become a higher tier, the assistants reply faster

Yes, OpenAI had throttled the token generation rate of some models for those under “tier 1”. I haven’t heard much forum complaint about this recently (after they hit a whole bunch of API users with slower output without announcement) so I can’t speak to what improvement you would see by prepaying more to get to a higher payment trust tier.

Here’s a link for courses. “semantic search” “vector database” are what you are after, including some sponsored by OpenAI.

You could explore LangChain https://python.langchain.com/docs/get_started/introduction. It will also allow you to have greater control over which chat model you want to use, and the process to setup the knowledge base (from which AI agent can retrieve context) is quite simple. Plus, you won’t have to pay anything extra to index your documents unlike when using the retrieval tool with Open AI’s assisant API. You can also choose to use free cloud-based vector stores to host your data index online.