-
Notifications
You must be signed in to change notification settings - Fork 31.7k
Enabled Flash Attention for PaliGemma models #34009
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Enabled Flash Attention for PaliGemma models #34009
Conversation
|
@qubvel Please review the changes and comment |
qubvel
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fix!
Can you please:
- Fix quality tests, you should run
make modified_only_fixup,make repo-consistency - When quality tests are fixed, please push an empty commit with the message
[run_slow] paligemmato ensure slow tests are also fine.
|
make modified_only_fixup all the tests passed but, Can you please point out what might be wrong? Thanks! |
|
The following error is in CI: You have to add the model to the following doc file: |
|
Hey @qubvel Thanks a lot! All the tests passed |
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
|
Can you please once again push |
|
Hey @qubvel the single and multi gpu tests didnt pass. i couldnt figure it out from the ci tests I an working on it , if you find out what might be the problem Please let me know |
|
Hey @qubvel sorry but i cant circle down the cause of this error Thanks |
|
Hi, I will take a look next week! Thanks for patience! |
qubvel
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @aroun-coumar, can you please rebase your branch to the current main to include recent changes and resolve conflicts (hopefully slow tests issues also will be resolved)
169500d to
39ebb03
Compare
qubvel
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like we no longer need to pass attn_implementation=config._attn_implementation with this PR merged, can you check if Flash Attention / SDPA is enabled for PaliGemma on main?
You can use it as
model = PaliGemmaForConditionalGeneration.from_pretrained(..., attn_implementation={"vision_config": "flash_attention_2", "text_config": "sdpa"})
What does this PR do?
Fixes #33963
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
@qubvel