You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thanks very much for your work and for publishing your code. I am currently working on integration of SpinQuant into torch/ao, and I would like to clarify something about the code that would help me in my implementation.
In the paper, the following is mentioned in footnote 3:
In a pre-norm LLM like LLaMA, we can convert a transformer network into a rotation-invariant network by incorporating the RMSNorm scale parameters α into the weight matrix right after the RMSNorm layer.
In the code, this appears to be done in the fuse_layer_norms function.
However, I also noticed that in that same function, the embedding weights are modified, in the following lines:
Could you help me understand why this is done? I.e. subtraction of the mean from the input embeddings. I don't see a connection to the RMSNorm layer fusion, so I must be missing something.
Thanks in advance.
The text was updated successfully, but these errors were encountered:
Hi,
Thanks very much for your work and for publishing your code. I am currently working on integration of SpinQuant into torch/ao, and I would like to clarify something about the code that would help me in my implementation.
In the paper, the following is mentioned in footnote 3:
In the code, this appears to be done in the fuse_layer_norms function.
However, I also noticed that in that same function, the embedding weights are modified, in the following lines:
SpinQuant/utils/fuse_norm_utils.py
Lines 42 to 45 in 7f5bf66
Could you help me understand why this is done? I.e. subtraction of the mean from the input embeddings. I don't see a connection to the RMSNorm layer fusion, so I must be missing something.
Thanks in advance.
The text was updated successfully, but these errors were encountered: