Sharding in vector databases

I wanted to see if there are any feedback on how to do improve performance of a vector database. For example is there a way to shard the embedding database. Then given a question embedding determine which shard to access to get the match. That way I do not have to match the embedding with all the records in the database.

You can have a preliminary extensive search done with a database of the same chunks with smaller dimensions and smaller bit depth, for lower memory and computation (this does not need a second set of API calls, just math). That can give you a threshold or size subset to perform full-quality ranking on.

Anything else I can think of will be significantly different in effect than embeddings, excluding a large percentage of what might be returned. That might be a good thing if you have separate knowledge domains you put behind specific tool functions to be used on demand.