Extracting each word's embeddings from embedded sentence

Diet · August 28, 2024, 2:47am

Welcome to the community!

What you have is an embedding vector: it encodes the semantic meaning of your entire text, as a whole.

You can, in theory (in the sense that there’s nothing stopping you), send individual words against the endpoint to achieve your goal. Each word will also return a gigantic vector of the same dimension.

You then just compute the cosine similarity of each pair of vectors you want to compare.

Does that make sense?

_{I noted that in theory, there’s nothing stopping you. In practice, these text embedding models aren’t really built for comparing individual words - partly because individual words can be very context sensitive and mean different things depending on how or where they’re used. However, using the embedding model this way might get you 99% of the way where you want to go, and it might even be super good enough. So I definitely do encourage you to try it.}

_{To improve it, if the budget and use-case allows, you could consider using a generative LLM to translate your text into a structured list of contextual definitions, which you can then embed. Embeddings of definitions will generally capture more meaning than embeddings of words alone}

Topic		Replies	Views
Contextualized Embedding: Get GPT3 Embeddings of Each Token in a Sentence API	2	1332	December 17, 2023
Is it possible to get "context aware" embeddings? API embeddings	9	2718	August 31, 2024
Does openAI provide API that takes Embeddings as an input? API embeddings	10	4585	December 18, 2023
How can I send vectors as a chat context? Prompting embeddings	8	9557	May 15, 2023
Understanding Embedding Granularity API	7	2080	December 17, 2023

Extracting each word's embeddings from embedded sentence

Related topics