I've gone back to ADA 2, text-embedding-3-small is not working for me

And that makes me sad.

I’ve had a specific issue with search results … looking for text with Patch numbers.

I’ve found text-embedding-3-small in combination with Cosine simularity was returning many results for snippets with the wrong version numbers in it.

So, for example, if I’m looking for text on “update 5.23” te3s was returning, preferentially lots of snippets on update 5.21, instead of 5.23. When combined with a threshold cut-off or a “Top 5” slice, this was removing all results that should have been found with 5.23.

Roll back to ADA 2 and, boom, problem gone.

I’m using an HNSW index … I’m wondering if that might be having an influence … but the ADA 2 setup without changing any index parameters, is working for me again.

I’m tempted to try a higher resolution index at some point with the new model, but that will eat memory!

Anyone had a similar disappointment?

1 Like

It has been mentioned before that you need to update your thresholds when migrating to the v3 embedding models. Have you already tried that?

Like this guy?

:wink:

No, this was not the problem. The problem was regardless of threshold, the order of cosine distance was prefering data that was clearly more different. And that’s definitely a problem!

2 Likes

I’ve seen this as well; with text-embedding-3-small, I get a totally different set of similar results compared to ada-002. The relevance seems broken, and is unusable.

1 Like

Thanks, I thought I was going crazy!

It’s pretty concerning, and I’m not seeing this mentioned anywhere else. I’m going to try and reach out to some OpenAI folks about this.

I just ran a test where I ingested 10 text files with text-embedding-3-small, and searched for some text, and then did the same thing with ada-002, and got completely different list of results. (And ada-002 was 100% correct with the top hits.)

1 Like

Did you here back from the openai folk?

Unfortunately not. Tried to reach out but no response.

I need to retest this, since I just changed how we preprocess the text by not removing newline characters. (Found this was only a thing in v1 of ada-002, and not needed in v2, so probably not in text-embedding-3 either.)

I’ll followup again after I have another chance to compare.

Came here after having similar issues. Recently I’ve started using text-embeddding-3-small and my search results coming back empty. Switching back to text-embedding-ada-002 appears to be improving it.

Now what I don’t know really, to be honest, if I have embeddings previously produced using text-embedding-ada-002 and if I search in them using a query vector produced by text-embedding-3-small is there a huge impact on the results?

I have quite a large db of vectors and if possible I would like to avoid re-indexing them using te3s. Especially while having issues with search results

Yea, ok after reading a bit more around the forum, it became clear that ada and te3s are not compatible, this is not like changing a gpt model for a given input. You can not simply start using te3s on an existing vector db produced by ada.

Not sure if this was clear to everybody except me :upside_down_face:

… and also your distance threshold needs to be different. The threshold for ada 2 is generally much higher

1 Like