-
-
Notifications
You must be signed in to change notification settings - Fork 430
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature request] Rotary positional embedding in cross attention #303
Comments
Quick hack implementation here: https://github.com/Aceticia/x-transformers/tree/cross_attn_rot_pos |
…xt is in shared positional space as input
@Aceticia hey Chris, yes indeed i've actually came across this need a couple times in the past, but wasn't sure if the general public would do you want to take a look at the latest commit and see if that works? |
This looks good! Maybe the test could also test passing in custom position for the self attention part just for completeness? This is the scenario I use it in mostly |
@Aceticia good idea, how about now? |
Looks great! Speedy as always |
Tested and it works perfectly. Thanks! Closing |
@lucidrains I was just playing around with things and I just realized there is a new bug - if you turn on rotary pos emb and give context pos as input, cross attender will fail since there is no |
@Aceticia oh do you have the stack trace for that? |
Should be an easy fix? |
@Aceticia ohh, i think you have a network without any self attention layers? added a quick fix as it should be valid anyways |
Yes, now it's good. Thanks! |
It's me again :)
It would be nice if cross attention models can accept a
context_pos
kwarg, mirroring the behavior ofpos
whenrotary_pos_emb=True
. This doesn't make sense in the context of encoder-decoder transformer models, but for my MAE pretraining it actually makes sense because the encoder and decoder position both refer to the same 1D space.Considering that the current behavior in cross attention is to simply ignore positions and rotary positional embeddings, I propose to keep the same behavior unless a
context_pos
is explicitly passed in, now that custom positions are supported. What do you think?The text was updated successfully, but these errors were encountered: