Speeding up Langchain/LLamaIndex + API calls

Some days ago i tried to use LLamaIndex (vector database with semantic search) + gpt3.5 API for question answering about pdf textual data.
The problem was latency: more than 30s to wait until you get an answer.

Since i need to put this app in production environment, this latencies are unacceptable for the customer.

I tried to use SentenceTransformer for embedding + openai API and was able to cut latency to 20s, but that’s still too much for production.
Of this 20s, 15 are due to openai API.

Is there a way to reduce this time? I would like to get to 3-7 s latency…

Do you use Memory or you persist your data to storage using storage_context.persist()?

Memory is Fast (but data nota persist)

1 Like

thanks. at the end i solved by using langchain instead of lamaindex

1 Like

Hello, by how much you reduced time with Langchain?