Using ChatGPT 3.5 Turbo with Langchain is excessively slow

I’ve built an app in NodeJS using Langchain and chatgpt-3.5 and have created a memory vector store with a bunch of crawled pages using the recursive web loader from Langchain. After taking some time to process the page text the model fires up and I’m able to submit queries.

The problem I’m having is the responses are taking a long time to generate - some take 3 seconds, but sometimes it takes as long as 11 seconds, and I don’t know how to fix it. I have a prompt that defines how the model should respond, and there is admittedly a lot of data in the vector database, something on the order of ~33,000 paragraphs of text, but I imagine the production version of GPT 3.5 is way bigger and yet it responds within a couple seconds most of the time. I would love to get down to 2-3 seconds on average rather than ~8 seconds on average that I currently see.

Is there anything I could be missing? Could I improve the prompt? Is chatgpt-3.5 the wrong model to use for this use case?

Is it ChatGPT which is taking a long time, or the server having to perform some similarity computation against 33,000 paragraphs of text for every query?

What exactly am I supposed to get from this response? I can’t change the amount of data I have, and I already stated that the public ChatGPT is necessarily much larger and yet is very fast.

ChatGPT is a probability model. It doesn’t do any look ups to a vector database. The latency you have is

ChatGPT time + vector search time

It is not possible for yours to be as quick as ChatGPT, unless you reduce your search time to 0. It isn’t a ChatGPT issue.