Important delay issues with Assistants Using Retrieval Augmentation

rfroli · November 30, 2023, 7:53am

Hi,

I am currently experiencing significant delays, often exceeding 40 seconds, when using assistants with retrieval augmentation in the playground environment. The setup is designed to search within a 1.5MB text file (user manual) that is stored alongside the assistant, and I am using the PlayGround.

With queries exactly matching sentences in the document, the response time is more thant 45 seconds. (As the generated code performs a vector search, I presume most of the time is spent there).

Are there any recommendations or methods to optimize this process? The current response times are impractical for our intended production use.

Thank you for your assistance.

udm17 · November 30, 2023, 8:01am

The simplest and the most easiest solution to this would be to manually do the RAG using a vector database and cosine similarity and then use the model to whatever is the next step.

The documentation for RAG with the newer model says it uses a vector search when it feels the data is large enough which i do think is that case at 1.5 mb. Most likely, it might be feeding the whole text into the prompt and using that to do the generation

rfroli · November 30, 2023, 8:59am

@udm17 Thank you for your suggestion.
Before OpenAI released the assistant feature, I had explored the manual vector search track. I was expecting better results with the integrated OpenAI solution. However, I am now questioning its practicality. How can assistant retrieval be considered useful if it struggles with efficiently searching a simple 23-page document? This level of inefficiency is concerning for our intended use.

xifan.wang · November 30, 2023, 9:25am

tbh, my suggestion will be to build your own retrieval function and let the openai assistant call it. I think what openai did is for users who want to do some quick retrieval, have small amounts of data, and simply don’t want to build their own retrieval system.

rfroli · November 30, 2023, 10:20am

@xifan.wang, your suggestion to build our own retrieval function while still utilizing the OpenAI assistant architecture is indeed a clever approach. It would enable us to use OpenAI’s retrieval for development and then implement our own retrieval system for production. The challenge lies in maintaining a high-performance endpoint for more efficient retrieval. Would you recommend any tools or platforms?

xifan.wang · November 30, 2023, 10:36am

Langchain and Llamaindex are quite good at building their customizable vector database for retrieval. There are also cloud solutions like Pinecone, Weaviate, etc.

sdani53 · February 8, 2024, 12:31am

@rfroli, Have you tried this yet? If yes, what kind of improvements are you seeing in the latency from the assistant’s response with a custom retrieval function?

rfroli · February 8, 2024, 7:51pm

No, I haven’t tried this solution yet, as the cost of using the assistant APIs is quite high, and we can’t afford this expense in a production environment. I’ll consider trying it again once it’s out of beta. However, if the costs remain prohibitive, then I will manually perform the RAG tasks by using the search results from my vector search as the context for simple prompts.

Topic		Replies	Views
Assistants API is too slow! API assistants-api	26	6971	March 16, 2025
Why Assistants API is Slow? Any speed solution? API api-speed , openai , rag , assistants-api	15	9752	September 10, 2024
How to improve OpenAI Assistants API (File Retrieval) with response time? API assistants	3	2324	April 23, 2025
Assistants API Performance API api , assistants-api	11	3106	March 21, 2024
20, 30 sec assistants API answer Feedback api , assistants-api	11	1088	February 21, 2025

Important delay issues with Assistants Using Retrieval Augmentation

Related topics