Positional encoding and implicit grammar

I am a scientist in the computer linguistic field and I have a question how GPT-models process grammar. If I am right, grammar is solely processed due to positional encoding. Without positional encoding, one can change the order of the words in the input and one gets the same probabilities for the output.

But, to my knowledge, the positional encoding is realized by adding a small vector to the word embeddings, so that the semantics of the words only change slightly. If the operations on the input are linear, as the multiplication with the key and query matrices, the small added vector is preserved. But how does the model react on non-linear prcessing steps? Is the small vector then “absorbed” by the larger embedding vector?