You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For certain architectures (like GPTJ and LLaMa), it may be possible to replace Query $Q$ and Key $K$ matrices by a single matrix - saving on 1 out of seven/eight matrix multiplications in the transformer. I don't see an obvious way of having this for GPT-NeoX and OPT.
Take a standard benchmark, run the model before and after merging Query and Key matrices.
---------- Following are the details: (How to write latex in GitHub?)---------- .T() denotes transpose
The text was updated successfully, but these errors were encountered:
Ayushk4
changed the title
Benchmark after merging query and keys matrices in transformers
Benchmark effect of merging query and keys matrices in transformers
Mar 19, 2023
For certain architectures (like GPTJ and LLaMa), it may be possible to replace Query$Q$ and Key $K$ matrices by a single matrix - saving on 1 out of seven/eight matrix multiplications in the transformer. I don't see an obvious way of having this for GPT-NeoX and OPT.
Take a standard benchmark, run the model before and after merging Query and Key matrices.
---------- Following are the details: (How to write latex in GitHub?)----------
.T()
denotes transposeConsider the input representation$X = {x1, ... xi, ... xj, ... xn}$ .
qi = MatMul(Q, xi)
kj = MatMul(K, xj)
score_i,j = MatMul(qi.T(), kj)
= MatMul( MatMul(Q, xi).T(), MatMul(K, xj) )
= MatMul( MatMul(xi.T(), Q.T()), MatMul(K, xj) )
= MatrixChainMul(xi.T(), Q.T(), K, xj)
let QKMerge = MatMul(Q.T(), K)
score_i,j = MatrixChainMul(xi.T(), QKMerge, xj)
The text was updated successfully, but these errors were encountered: