Fastest and most precise vector db and LLM

alpaca.6517 · February 8, 2024, 10:33am

Hi all,
I am creating a simple RAG based voicebot that is to be deployed on a car dealership. For this I am simply using the Azure AI search service as the vector index and GPT 4 turbo model as LLM.
The vector search is taking 2.5 to 3 seconds.
And the gpt 4 turbo response time is anywhere between 3 to 5 seconds.
I am thinking of switching the vector db, would using others like pinecone, weaviate etc improve the speed? If so which one would be best?
Also I am thinking of switching to llama 2 70b, what inference time can i expect from it?
My goal is to reduce the latency to 3 to 4 seconds.

merefield · February 8, 2024, 10:43am

That’s slow - how many records are you dealing with?

What is the database platform?

alpaca.6517 · February 8, 2024, 10:49am

There are around 10K records.

I have used Azure AI search index to store them and am doing a hybrid (vector+ semantic reranking) to get the relevant records.

And yes, even i’m surprised at how slow this is working.

merefield · February 8, 2024, 10:53am

Ah but you are doing a hybrid search, that may make it slower.

Using pgvector on PSQL with 150k records, I’m getting split-second ordered matches on a tiny VPS with only 4GB.

But that is using full on semantic search only. Perhaps you should consider it for performance reasons.

alpaca.6517 · February 8, 2024, 11:16am

Yes, having a split second response will be highly beneficial.

Thanks in advance

Topic		Replies	Views
Response speed with semantic searching API	2	1111	December 29, 2023
Speeding up Langchain/LLamaIndex + API calls API langchain	3	1745	November 10, 2023
Assistant API Performance is Very Slow API plugin-development , api	10	4800	March 7, 2024
Seeking Advises on Optimizing openAI API Calls Feedback api	2	519	November 16, 2023
Important delay issues with Assistants Using Retrieval Augmentation API assistants	7	936	February 8, 2024

Fastest and most precise vector db and LLM

Related topics