Embeddings performance difference between small vs large at 1536 dimensions?

uqarni · February 7, 2024, 4:32pm

Is there a performance difference between text-embedding-3-small @ 1536 length and text-embedding-3-large @ 1536 length?

EricGT · February 7, 2024, 4:39pm

Using kapa.ai on OpenAI Discord

Prompt

For the following text, summarize the idea, extract meaningful questions and give possible answers.

Is there a performance difference between text-embedding-3-small @ 1536 length and text-embedding-3-large @ 1536 length?

Reply

Idea Summary: The user is inquiring about the performance difference between two of OpenAI’s third-generation embedding models: text-embedding-3-small and text-embedding-3-large . Both models have an output dimension of 1536.

Extracted Question: Is there a performance difference between text-embedding-3-small and text-embedding-3-large , both having an output dimension of 1536?

Possible Answer: Yes, there is a performance difference between the two models. According to the documentation (https://platform.openai.com/docs/guides/embeddings), the text-embedding-3-large model is the most capable embedding model for both English and non-English tasks, while the text-embedding-3-small model offers increased performance over the second-generation ada embedding model. In terms of specific performance metrics, the text-embedding-3-large model has an example performance of 64.6% on the MTEB (GitHub - embeddings-benchmark/mteb: MTEB: Massive Text Embedding Benchmark) evaluation, while the text-embedding-3-small model has a performance of 62.3%. Therefore, the text-embedding-3-large model performs slightly better than the text-embedding-3-small model.

merefield · February 7, 2024, 5:08pm

So far, that’s not my experience. I’ve found it inferior to ADA 2 in real world tests. YMMV.

_j · February 7, 2024, 7:49pm

I think the performance difference would be that if you requested 1536 dimentsion from text-embeddings-3-large, you would get an API error, as it is not a supported parameter.

However, 1024 that is supported is essentially the first 1024 dimensions returned out of the 3k. It is hard to imagine that if you truncated to 1536 manually yourself instead you would get worse semantic performance then at 1024.

The quality, parameters, and the computation dedicated to the model you can infer from the price. Then compare cost + dimensionailty to figure where it fits into the scheme of ada-babbace-curie 100% gpt-3 embeddings.

uqarni · February 7, 2024, 8:14pm

I am able to generate 1536d vectors using large

_j · February 7, 2024, 8:28pm

Interesting. It appears that the dimensions API parameter is just a truncater and normalizer

2048:

{‘data’: [{‘embedding’: [0.018731665, -0.03743733, -0.0029751628, -0.007448469, -0.0030954042], ‘index’: 0, ‘object’: ‘embedding’}], ‘model’: ‘text-embedding-3-large’, ‘object’: ‘list’, ‘usage’: {‘prompt_tokens’: 6, ‘total_tokens’: 6}}
len: 2048

4:

{‘data’: [{‘embedding’: [0.4394683, -0.87832665, -0.06980105, -0.1747504], ‘index’: 0, ‘object’: ‘embedding’}], ‘model’: ‘text-embedding-3-large’, ‘object’: ‘list’, ‘usage’: {‘prompt_tokens’: 6, ‘total_tokens’: 6}}
len: 4

4096:

API Error: Error code: 400 - {‘error’: {‘message’: “Invalid value for ‘dimensions’ = 4096. Must be less than or equal to 3072.”, ‘type’: ‘invalid_request_error’, ‘param’: None, ‘code’: None}}

The blog makes a specific point of particular dimensions, but I hadn’t fuzzed the inputs to see what it could do.

By default, the length of the embedding vector will be 1536 for text-embedding-3-small or 3072 for text-embedding-3-large. You can reduce the dimensions of the embedding by passing in the dimensions parameter without the embedding losing its concept-representing properties.

LinqLover · February 7, 2024, 9:20pm

Yes, see:

_j · February 7, 2024, 9:55pm

Yes, and to expand on my contribution to that topic…

Here’s another thought:

If reduced dimensions were remapped so that dimensions most relevant for obtaining semantic distinguishment were placed first to tolerate truncation, trained by extensive trials and then sorting the output order, how would that be done?

By targeting against a benchmark.

One might postulate then, in making a truncatable embeddings model, that benchmarks such as MTEB and others might have been used for discovery of dimensions with highest applicability to known tasks.

Thus, those reduced embeddings dimensions by parameter specification may be more performative against benchmarks than in general or novel use.

The challenge then is in coming up with “unseen” cases to qualify different 1536-dimension output, available from all API models. To then find out if half of 3-large takes a larger hit than a single metric shows. Find out how poor the second-half is…

LinqLover · February 7, 2024, 10:20pm

Given the MTEB scores in the blog post which reveal that text-embedding-3-large @ 256 length reaches a higher score than text-embedding-3-small @ 512 length, I would say it is likely that text-embedding-3-large @ 1536 length also outperforms text-embedding-3-small @ 1536 length.

uqarni · February 7, 2024, 10:23pm

Nice observation. Thank you. I’ll use large for my 1536

_j · February 7, 2024, 10:28pm

But again, that’s a case where the truncation designer can specifically target the benchmark and the dimensions that perform best on it, and where on a large parameter model, there are more to choose from. There’s no promise from OpenAI that MTEB score alone isn’t exactly what the lower dimensionality targets.

hadbits · April 13, 2024, 3:06pm

Thank you on this. I was thinking the same because of openai’s documentation. I was bracing myself for some database changes having created a bunch of objects with 1536 dimension in mind. Hoping the transition will be seamless.

Topic		Replies	Views
Better performance using text-embedding-3-large? API embeddings	6	6243	February 11, 2025
Transitioning to the new embeddings models from ada API embeddings	8	5806	January 27, 2024
Are OpenAI text-embedding-ada-002 embedding model greater than text-embedding-3-large? Community embeddings , chatgpt , api	1	1792	February 21, 2024
Quality of embeddings using davinci-001 embeddings model vs. ada-002 model API embeddings	15	4128	April 9, 2024
Should we update to the new embeddings models? API api	4	1264	February 10, 2024

Embeddings performance difference between small vs large at 1536 dimensions?

Related topics