Semantic search with Embeddings

I have a list of animals e.g. bird, dog, fish etc. uploaded as embeddings.
I then query this list with the query, “which animal has scales?”
Using ADA, the results often show “bird” as having a greater cosine similarity than “fish” for this query.
I don’t understand why that would be or whether my understanding of embeddings is incorrect. Would appreciate any guidance.


For the list of animals, did you only embed the word “fish” or did you embed a description “A fish has scales, lives underwater …”?

Also, are you using ‘text-embedding-ada-002’ or some other engine?

1 Like

Just fish. using text-embedding-ada-002

Try embedding a description of a fish (or each animal). Be explicit and detailed. The embedding engine you are using can handle 8k tokens, which a lot of text, so don’t be shy! I know this sounds like a lot of work, so you can also have GPT-3 describe each animal and embed those responses instead.

This will enrich the embedding vector and should give you a closer match.


Could you please provide a sample of the input text? It’s difficult to gauge why it’s giving that response without seeing what the source of the embedding looks like

I mean “birds” did come from dinosaurs, they say. Haha. I wonder if that’s why it occasionally is higher?

I agree with @curt.kennedy that more information would likely be helpful…

Good luck. Let us know if you get it sorted out.

1 Like

I don’t doubt that it would produce more predictable results if I gave it more data about each animal.
If that is the issue, I wouldn’t describe this as semantic search nor would it be very useful. There are better methods than GPT for simple word-match searches.

1 Like

The list shows the data and the cosine similarity for the query “which animal is commonly perceived as most likely to have scales?” - Best answer: Sheep (!?)

The other image shows a list of other queries and the best answer provided. The only query returning fish as the best answer was “What animal has scales and lives in water?”

1 Like

I definitely understand your frustration. There are other embedding engines out there. I personally have used Glove before, but this was at the word level, not the sentence or paragraph level.

There might be embedding engines out there that are better at embedding smaller chunks of text. But generally this would shrink the vector space (length of the embedding vector), so that more features are embedded in a smaller space, making them closer by default. So this engine would likely not distinguish as well.

It’s all a trade. But at the word level, I had good performance with only 50 to 100 dimensions. At the phrase level, I would go much higher, the 1k-2k that OpenAI is using now seems reasonable.


The new text-embedding-ada-002 model is not outperforming text-similarity-davinci-001 on the SentEval linear probing classification benchmark. For tasks that require training a light-weighted linear layer on top of embedding vectors for classification prediction, we suggest comparing the new model to text-similarity-davinci-001 and choosing whichever model gives optimal performance.

Check the Limitations & Risks section in the embeddings documentation for general limitations of our embedding models.

Is text-similarity-davinci-001 still available?

Also, maybe it’s misunderstanding the word “scales”…

As in “scales” can be read a couple different ways, and maybe sheep (somehow) are classified as likely having “scales” (but the other definition…ie hierarchies…)

“What animal has scales and lives in water?” helps the model understand which definition of scales you’re after…

Just trying to help you brainstorm…


Just to add to the response from @PaulBellow

Many of these embedding models are trained on data scraped from the internet. And guess what the internet says:

“Birds evolved from a group of meat-eating dinosaurs called theropods. That’s the same group that Tyrannosaurus rex belonged to.”

The model doesn’t always have common-sense in realizing that when you embed ‘bird’ that you only mean things only modern birdlike. There are soooo many bird/dinosaur articles out there that it isn’t out of the question for the model to link the two, given the data it was trained on.

I understand completely why it’s returning these results. However, it makes it much less useful.