Speeding up Langchain/LLamaIndex + API calls

bocchesegiacomo01 · June 4, 2023, 9:03pm

Some days ago i tried to use LLamaIndex (vector database with semantic search) + gpt3.5 API for question answering about pdf textual data.
The problem was latency: more than 30s to wait until you get an answer.

Since i need to put this app in production environment, this latencies are unacceptable for the customer.

I tried to use SentenceTransformer for embedding + openai API and was able to cut latency to 20s, but that’s still too much for production.
Of this 20s, 15 are due to openai API.

Is there a way to reduce this time? I would like to get to 3-7 s latency…

damiande · July 9, 2023, 1:26am

Do you use Memory or you persist your data to storage using storage_context.persist()?

Memory is Fast (but data nota persist)

bocchesegiacomo01 · July 14, 2023, 8:22pm

thanks. at the end i solved by using langchain instead of lamaindex

aazizi.soufiane · November 10, 2023, 7:46pm

Hello, by how much you reduced time with Langchain?

Topic		Replies	Views
LangChain+LlamaIndex taking too long to give a answer API api , langchain , assistants-api	0	1094	February 6, 2024
Slow API Response Time w/ LangChain & Redis API gpt-4	0	351	April 30, 2024
Using ChatGPT 3.5 Turbo with Langchain is excessively slow API chatgpt , langchain	3	2884	October 21, 2023
Fastest and most precise vector db and LLM Community gpt-4 , azure	4	1900	February 8, 2024
Important delay issues with Assistants Using Retrieval Augmentation API assistants	7	936	February 8, 2024

Speeding up Langchain/LLamaIndex + API calls

Related topics