I’m curious if we know anything about how the embeddings being outputed from the /embeddings endpoint are being generated.
A lot of past work on embeddings is at a word-level. So, if you wanted to calculate the similarity of two texts (> 1 word each), you would use a mean embedding method and average together all of the embeddings for the words in the texts. It tended to be a really blunt instrument.
Does anyone know how these embeddings are being calculated, at least on a conceptual level? I was thinking maybe they were being grabbed from one of the self-attention layers, but even that seems to be on a word-level. Are they just drawn from a dense layer lower down-stream? Curious if we have any idea.