It looks like 'text-embedding-3' embeddings are truncated/scaled versions from higher dim version

curt.kennedy · February 10, 2024, 3:46am

I think this is possible now. Since this is what it looks like the models are doing internally.

But it would come at a cost increase to get those big DaVinci vectors back.

I wish the models weren’t deprecated as fast as they are too … a model seems to last, what, maybe a year or so?

This makes the O&M much higher. And I would have to run multiple embedding engines, and continuously upgrade each one and monitor its performance.

But speaking of multiple dimensions, have you thought of literally sticking a bunch of vectors together, from different models, to form the large massive vector, that maybe could be useful to you?

So if each vector is a unit vector, and say you fuse 10 models together to form some massive 15k dimensional vector, like DaVinci, then you take the dot product, and divide by 10 to get your new “mega vector” dot product.

This vector would contain information from 10 different models.

Would this help you?

PS. Creating this large 15k vector isn’t totally necessary. As you could just correlate the 10 models, sum the answer, and it will be the same.

The main difference here is that stacking models like this is coherent processing, whereas performing hybrid rankings from different models is non-coherent processing.

Large vectors, like DaVinci can be coherently synthesized from multiple models.

The question is … is coherent better than non-coherent? My gut feel is that coherent is better, because more dimensions are better, as you’ve experienced.

Topic		Replies	Views
Why `OpenAI Embedding` return different vectors for the same text input? API	35	11669	April 30, 2024
Some questions about text-embedding-ada-002’s embedding API	146	45817	December 13, 2023
Can text-embedding-ada-002 be made deterministic? API embeddings , ada	18	8475	December 24, 2023
Embeddings: The average and extreme values within dimensions of 3-large API	0	196	November 21, 2024
Discussion thread for "Foundational must read GPT/LLM papers" Community gpt-4 , gpt-35-turbo , chatgpt , research	75	11569	September 3, 2024

It looks like 'text-embedding-3' embeddings are truncated/scaled versions from higher dim version

Related topics