Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature request] Rotary positional embedding in cross attention #303

Closed
Aceticia opened this issue Dec 10, 2024 · 12 comments
Closed

[Feature request] Rotary positional embedding in cross attention #303

Aceticia opened this issue Dec 10, 2024 · 12 comments

Comments

@Aceticia
Copy link
Contributor

It's me again :)

It would be nice if cross attention models can accept a context_pos kwarg, mirroring the behavior of pos when rotary_pos_emb=True. This doesn't make sense in the context of encoder-decoder transformer models, but for my MAE pretraining it actually makes sense because the encoder and decoder position both refer to the same 1D space.

Considering that the current behavior in cross attention is to simply ignore positions and rotary positional embeddings, I propose to keep the same behavior unless a context_pos is explicitly passed in, now that custom positions are supported. What do you think?

@Aceticia
Copy link
Contributor Author

lucidrains added a commit that referenced this issue Dec 10, 2024
@lucidrains
Copy link
Owner

@Aceticia hey Chris, yes indeed i've actually came across this need a couple times in the past, but wasn't sure if the general public would

do you want to take a look at the latest commit and see if that works?

@Aceticia
Copy link
Contributor Author

This looks good! Maybe the test could also test passing in custom position for the self attention part just for completeness? This is the scenario I use it in mostly

@lucidrains
Copy link
Owner

@Aceticia good idea, how about now?

@Aceticia
Copy link
Contributor Author

Looks great! Speedy as always

@Aceticia
Copy link
Contributor Author

Tested and it works perfectly. Thanks! Closing

@lucidrains
Copy link
Owner

lucidrains commented Dec 10, 2024

wolfram

need to do the wolfram setup so i can send code to the cloud while walking the dog. maybe in the near future with AR glasses + some gesture interface

@Aceticia Aceticia reopened this Dec 10, 2024
@Aceticia
Copy link
Contributor Author

@lucidrains I was just playing around with things and I just realized there is a new bug - if you turn on rotary pos emb and give context pos as input, cross attender will fail since there is no mem.

@lucidrains
Copy link
Owner

@Aceticia oh do you have the stack trace for that?

@Aceticia
Copy link
Contributor Author

File "/gpfs/data/oermannlab/users/xl3942/.conda/envs/neuro/lib/python3.12/site-packages/x_transformers/x_transformers.py", line 1992, in forward
    maybe_mem = mems[0] # todo - handle edge case where different layers get different memory lengths. don't think this will ever come up but who knows
                ~~~~^^^
IndexError: list index out of range

Should be an easy fix?

@lucidrains
Copy link
Owner

@Aceticia ohh, i think you have a network without any self attention layers? added a quick fix as it should be valid anyways

@Aceticia
Copy link
Contributor Author

Yes, now it's good. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants