How much do your vector dot products vary? Say take one of those vectors, fix it as a reference, and dot the other vectors with it. What is this variation?
Some variation is expected because of the random timing in the GPU’s, and that floating point is not associative, and they are likely taking the last hidden layer and scaling out to the unit hyper-sphere, which would magnify the error for hidden states close to the origin.