Hello,
I am a computer linguist working on grammar. I have a question about positional encoding.
Is it true that without positional encoding one can change the word ordering of all words in the prompt and one obtains the same probabilities for the next word?
The tokens of a string are translated into semantic vectors. These vectors are multiplied with certain matrices (Query, Key, Value). I want to omit the positional encoding for now. The multiplication of a semantic vector with a matrix results again in a vector x. How are these vectors x processed further? In a way that one knows from which token position in the string the different x originate?
Yes. But I think this does not answer the question. One multiplies word vectors with fixed matrices and I do not know, if one has still the information, which word vector was multiplied first. Is the position information still available or solely encoded in the word vector?
With the cosine you mean the cosine between two word vectors? If I am right is a word vector a compressed version of a row of the co-occurence matrix. So, where, if not in the positions, is grammar encoded?
Ok. I know, if the word is a noun or a verb is encoded in the word vector. But for analyzing the grammar issues in GPT, I want to find out, if the position information of the word is solely encoded in the positional encoding or is available otherwise. The question is still, whether the position information vanishes after multiplying with the query and key matrices or it vanishes not?