Fastest and most precise vector db and LLM

Hi all,
I am creating a simple RAG based voicebot that is to be deployed on a car dealership. For this I am simply using the Azure AI search service as the vector index and GPT 4 turbo model as LLM.
The vector search is taking 2.5 to 3 seconds.
And the gpt 4 turbo response time is anywhere between 3 to 5 seconds.
I am thinking of switching the vector db, would using others like pinecone, weaviate etc improve the speed? If so which one would be best?
Also I am thinking of switching to llama 2 70b, what inference time can i expect from it?
My goal is to reduce the latency to 3 to 4 seconds.

That’s slow - how many records are you dealing with?

What is the database platform?

There are around 10K records.

I have used Azure AI search index to store them and am doing a hybrid (vector+ semantic reranking) to get the relevant records.

And yes, even i’m surprised at how slow this is working.

1 Like

Ah but you are doing a hybrid search, that may make it slower.

Using pgvector on PSQL with 150k records, I’m getting split-second ordered matches on a tiny VPS with only 4GB.

But that is using full on semantic search only. Perhaps you should consider it for performance reasons.

1 Like

Yes, having a split second response will be highly beneficial.

Thanks in advance :slight_smile:

1 Like