Why have embeddings always the same length of 1536 numbers?

No matter how long the input string is, the return of creating embeddings with “text-embedding-ada-002” is always a vector of 1536.

Can someone please explain to me why this is so? I would expect that a text consisting of 2 words would create smaller embedding vectors than a whole paragraph. But that’s not the case. They always have the same length.

(Disclaimer: I am still learning; please forgive me if it’s a stupid question)

Embeddings generally used to find the similarity between different pieces of text. The notion of ‘similar’ has to be found with some sort of distance measure.

Normally the Euclidean Distance or Cosine Distance are used and both of these require vectors to be of the same dimension.

Let’s say the embeddings scaled with the length of the text. The issue here is it is not clear how to use embeddings of different length and compare them in a meaningful way.

Hopefully the model has learnt to represent text regardless of length in a meaningful way so that it can be used for comparisons.

Hope this makes sense.

1 Like

An embedding is a high dimensional vector, so it points to a single “location” in that higher dimensional space. So the number of dimensions is what controls the size of the vector.

2 Likes