Which database tools suit for storing embeddings generated by the Embedding endpoint?

curt.kennedy · September 22, 2023, 12:32pm

The cosine similarity search is done in memory, not in the database. So search in memory, then go back to the database to get the top text hits.

The database I use is DynamoDB.

You can have unlimited rows (caveats below).

For 1 second of latency, you chunk your memory into 400,000 embeddings each. And realistically per account you can have 500 of these running at the same time. So the realistic high end for one instance is 200 million rows. But this is per search, and you can rotate your data instantly (**) for another 200 million search, and all of your data (~trillions of rows) is in a single database.

So you create in-memory shards for search, and when you find what you are looking for, you retrieve the correlated text from the database. To do the 200 million row case, in 1 second (**), you need a layer that can async the searches, so use another DynamoDB table backed with a lambda to procure the final answer.

(**) This is all theoretical high end estimates, so don’t be surprised if there are a few more seconds added for “reality”.

If you have a few hundred to a few thousand embeddings, and don’t want to use the cloud, then do the whole thing in memory, and skip the database.

Topic		Replies	Views
Storing embeddings in SQL Server? Latency between Redis & Pinecone? Vector DB recommendations? API	18	7700	December 23, 2023
Creating a Chatbot using the data stored in my huge database Community embeddings , chatgpt , fine-tuning , api	93	86867	November 25, 2023
Using Redis for embeddings API	21	13325	December 23, 2023
Reducing Cost of GPT 4 by using embeddings Prompting	23	10531	May 4, 2023
Introducing Embeddings Announcements	33	8801	November 27, 2023

Which database tools suit for storing embeddings generated by the Embedding endpoint?

Related topics