The cosine similarity search is done in memory, not in the database. So search in memory, then go back to the database to get the top text hits.
The database I use is DynamoDB.
You can have unlimited rows (caveats below).
For 1 second of latency, you chunk your memory into 400,000 embeddings each. And realistically per account you can have 500 of these running at the same time. So the realistic high end for one instance is 200 million rows. But this is per search, and you can rotate your data instantly (**) for another 200 million search, and all of your data (~trillions of rows) is in a single database.
So you create in-memory shards for search, and when you find what you are looking for, you retrieve the correlated text from the database. To do the 200 million row case, in 1 second (**), you need a layer that can async the searches, so use another DynamoDB table backed with a lambda to procure the final answer.
(**) This is all theoretical high end estimates, so don’t be surprised if there are a few more seconds added for “reality”.
If you have a few hundred to a few thousand embeddings, and don’t want to use the cloud, then do the whole thing in memory, and skip the database.