I’m experimenting with embeddings and storing them in a vector DB. My thought is that when I ask a model for the embedding for a given input, that embedding is associated with that model and cannot be reliably used with a different model. Is that true? If so, then if I store embeddings generated with model A, do I need to update or regenerate them if I switch to model A’ or B? BTW, I asked ChatGPT this question and it said I would need to regenerate my embeddings, but thought I would double-check with actual humans! Thanks.
Certainly with the current state of the art, the embeddings created by a particular model cannot be shared with the embeddings of another, it is an active area of research so this may change, but right now… no.
Embeddings by nature depend on the encoder with which they are created. With different models and different transformers having different techniques of encoding, there comes a mismatch in the decoder and thus the encoding created becomes garbage
Consider that at the most basic level, embeddings with different models have different rank tensors:
Ada (1024 dimensions),
Babbage (2048 dimensions),
Curie (4096 dimensions),
Davinci (12288 dimensions).
ada-V2 (1536 dimensions)
So if the question is if you can mix and match calls and still have any kind of sensical calculation, the answer would be no.