I think it’s not semantic meaning at all, but closer to semantics than keywords.
Here’s what I see going on, and hopefully this describes the situation …
So these LLM’s are a series of hidden layers, matrices, etc, and think of each hidden layer as a vector.
The embedding is generated from the last hidden layer, as floats, but before applying the bias and final activation functions. So they take this layer (vector) and normalize it back out to unit length and give you the vector, your embedding.
What does this layer mean?
It’s the vector state representing the next token prediction of all the previous tokens fed into it.
So the vector is “responding” to your text, understanding it, but isn’t saying anything since it’s an embedding model. It’s not allowed to say anything back.
So the embedding is the final hidden layer, the last state that will then produce a result (next token), but stopped just short of this. So it’s a frozen internal “understanding” of what you fed the model …
So is this the same as semantics and meaning?
No, not really. So what is it?
I think it’s like if someone said something to you, how would you feel about it? What would you say back based on your “training”, aka. life experiences. What if you could freeze your neural state at that moment, and send it out …
It’s this internal thought, or state, that is extracted. This isn’t semantics, it’s more of a snapshot of internal thoughts of what was said, what’s in the buffer. The LLM “thoughts” are the integrated excitation of states of all the tokens being ran into it.
These are then forced into chokepoints, and the most significant point of understanding is the final layer, which is essentially extracted, normalized and spit out.
Anyway, this is what I see going on.