Enabled Flash Attention for PaliGemma models #34009

aroun-coumar · 2024-10-07T15:10:37Z

What does this PR do?

Fixes #33963

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

@qubvel

aroun-coumar · 2024-10-07T15:12:05Z

@qubvel Please review the changes and comment
Thanks!

qubvel

Thanks for the fix!

Can you please:

Fix quality tests, you should run make modified_only_fixup, make repo-consistency
When quality tests are fixed, please push an empty commit with the message [run_slow] paligemma to ensure slow tests are also fine.

aroun-coumar · 2024-10-08T08:24:27Z

make modified_only_fixup

all the tests passed but,

(myvenv)
Arun kumar@LAPTOP-0U925LO8 MINGW64 ~/OneDrive/Desktop/Transformers/transformers (enable-flash-attention-paligemma)
$ make repo-consistency
python utils/check_copies.py
python utils/check_modular_conversion.py

Differences found between the generated code and
src/transformers/models\gemma\modeling_gemma.py:

   1 --- src/transformers/models\gemma\modeling_gemma.py_generated
   2 +++ src/transformers/models\gemma\modeling_gemma.py
   3 @@ -1,9 +1,9 @@
   4 -#           🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨
   5 +#           ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ð
   6  #               This file was automatically generated from <path_to_modul
   7  #         Do NOT edit this file manually as any edits will be overwritten
   8  #         the file from the modular. If any change should be done, please
   9  #                           modular_xxx.py file directly. One of our CI e
  10 -#           🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨
  11 +#           ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ð
  12  # coding=utf-8
  13  # Copyright 2024 Google Inc. HuggingFace Inc. team. All rights reserved.
  14  #

Differences found between the generated code and
src/transformers/models\gemma\configuration_gemma.py:

   1 --- src/transformers/models\gemma\configuration_gemma.py_generated
   2 +++ src/transformers/models\gemma\configuration_gemma.py
   3 @@ -1,9 +1,9 @@
   4 -#           🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨
   5 +#           ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ð
   6  #               This file was automatically generated from <path_to_modul
   7  #         Do NOT edit this file manually as any edits will be overwritten
   8  #         the file from the modular. If any change should be done, please
   9  #                           modular_xxx.py file directly. One of our CI e
  10 -#           🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨
  11 +#           ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ð
  12  # coding=utf-8
  13  # Copyright 2024 Google Inc. HuggingFace Inc. team. All rights reserved.
  14  #

Differences found between the generated code and
src/transformers/models\gemma2\modeling_gemma2.py:

   1 --- src/transformers/models\gemma2\modeling_gemma2.py_generated
   2 +++ src/transformers/models\gemma2\modeling_gemma2.py
   3 @@ -1,9 +1,9 @@
   4 -#           🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨
   5 +#           ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ð
   6  #               This file was automatically generated from <path_to_modul
   7  #         Do NOT edit this file manually as any edits will be overwritten
   8  #         the file from the modular. If any change should be done, please
   9  #                           modular_xxx.py file directly. One of our CI e
  10 -#           🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨
  11 +#           ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ð
  12  # coding=utf-8
  13  # Copyright 2024 Google Inc. HuggingFace Inc. team. All rights reserved.
  14  #

Differences found between the generated code and
src/transformers/models\gemma2\configuration_gemma2.py:

   1 --- src/transformers/models\gemma2\configuration_gemma2.py_generated
   2 +++ src/transformers/models\gemma2\configuration_gemma2.py
   3 @@ -1,9 +1,9 @@
   4 -#           🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨
   5 +#           ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ð
   6  #               This file was automatically generated from <path_to_modul
   7  #         Do NOT edit this file manually as any edits will be overwritten
   8  #         the file from the modular. If any change should be done, please
   9  #                           modular_xxx.py file directly. One of our CI e
  10 -#           🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨
  11 +#           ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ð
  12  # coding=utf-8
  13  # Copyright 2024 Google Inc. HuggingFace Inc. team. All rights reserved.
  14  #

Differences found between the generated code and
src/transformers/models\instructblipvideo\modeling_instructblipvideo.py:

   1 --- src/transformers/models\instructblipvideo\modeling_instructblipvideo.p
   2 +++ src/transformers/models\instructblipvideo\modeling_instructblipvideo.p
   3 @@ -1,9 +1,9 @@
   4 -#           🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨
   5 +#           ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ð
   6  #               This file was automatically generated from <path_to_modul
   7  #         Do NOT edit this file manually as any edits will be overwritten
   8  #         the file from the modular. If any change should be done, please
   9  #                           modular_xxx.py file directly. One of our CI e
  10 -#           🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨
  11 +#           ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ð
  12  # coding=utf-8
  13  # Copyright 2024 HuggingFace Inc. team. All rights reserved.
  14  #

Differences found between the generated code and
src/transformers/models\instructblipvideo\configuration_instructblipvideo.py:

   1 --- src/transformers/models\instructblipvideo\configuration_instructblipvi
   2 +++ src/transformers/models\instructblipvideo\configuration_instructblipvi
   3 @@ -1,9 +1,9 @@
   4 -#           🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨
   5 +#           ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ð
   6  #               This file was automatically generated from <path_to_modul
   7  #         Do NOT edit this file manually as any edits will be overwritten
   8  #         the file from the modular. If any change should be done, please
   9  #                           modular_xxx.py file directly. One of our CI e
  10 -#           🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨
  11 +#           ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ð
  12  # coding=utf-8
  13  # Copyright 2024 HuggingFace Inc. team. All rights reserved.
  14  #

Differences found between the generated code and
src/transformers/models\llava_next_video\modeling_llava_next_video.py:

   1 --- src/transformers/models\llava_next_video\modeling_llava_next_video.py_
   2 +++ src/transformers/models\llava_next_video\modeling_llava_next_video.py
   3 @@ -1,9 +1,9 @@
   4 -#           🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨
   5 +#           ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ð
   6  #               This file was automatically generated from <path_to_modul
   7  #         Do NOT edit this file manually as any edits will be overwritten
   8  #         the file from the modular. If any change should be done, please
   9  #                           modular_xxx.py file directly. One of our CI e
  10 -#           🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨
  11 +#           ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ð
  12  # coding=utf-8
  13  # Copyright 2024 HuggingFace Inc. team. All rights reserved.
  14  #

Differences found between the generated code and
src/transformers/models\llava_next_video\configuration_llava_next_video.py:

   1 --- src/transformers/models\llava_next_video\configuration_llava_next_vide
   2 +++ src/transformers/models\llava_next_video\configuration_llava_next_vide
   3 @@ -1,9 +1,9 @@
   4 -#           🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨
   5 +#           ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ð
   6  #               This file was automatically generated from <path_to_modul
   7  #         Do NOT edit this file manually as any edits will be overwritten
   8  #         the file from the modular. If any change should be done, please
   9  #                           modular_xxx.py file directly. One of our CI e
  10 -#           🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨
  11 +#           ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ðŸš¨ð
  12  # coding=utf-8
  13  # Copyright 2024 HuggingFace Inc. team. All rights reserved.
  14  #
Traceback (most recent call last):
  File "C:\Users\Arun kumar\OneDrive\Desktop\Transformers\transformers\utils\check_modular_conversion.py", line 76, in <module>
    raise ValueError("Some diff and their modeling code did not match.")
ValueError: Some diff and their modeling code did not match.
make: *** [Makefile:39: repo-consistency] Error 1

Can you please point out what might be wrong?
I couldnt fix this part

Thanks!

qubvel · 2024-10-08T11:22:10Z

Hi @aroun-coumar

The following error is in CI:
ValueError: paligemma should be in listed in the flash attention documentation but is not. Please update the documentation.

You have to add the model to the following doc file:
https://github.com/huggingface/transformers/blob/main/docs/source/en/perf_infer_gpu_one.md

aroun-coumar · 2024-10-09T03:51:15Z

Hey @qubvel Thanks a lot!

All the tests passed

HuggingFaceDocBuilderDev · 2024-10-09T08:52:28Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

qubvel · 2024-10-09T09:22:21Z

Can you please once again push [run_slow] paligemma, it should be the last commit to run slow tests

aroun-coumar · 2024-10-10T03:37:16Z

Hey @qubvel the single and multi gpu tests didnt pass.

i couldnt figure it out from the ci tests

I an working on it , if you find out what might be the problem Please let me know

aroun-coumar · 2024-10-21T13:07:45Z

Hey @qubvel sorry but i cant circle down the cause of this error
Both the GPU tests fail
Please give some suggestion how to debug this

Thanks

qubvel · 2024-10-21T18:48:27Z

Hi, I will take a look next week! Thanks for patience!

qubvel

Hi @aroun-coumar, can you please rebase your branch to the current main to include recent changes and resolve conflicts (hopefully slow tests issues also will be resolved)

qubvel

It looks like we no longer need to pass attn_implementation=config._attn_implementation with this PR merged, can you check if Flash Attention / SDPA is enabled for PaliGemma on main?

#32238

You can use it as

model = PaliGemmaForConditionalGeneration.from_pretrained(..., attn_implementation={"vision_config": "flash_attention_2", "text_config": "sdpa"})

qubvel reviewed Oct 7, 2024

View reviewed changes

qubvel added run-slow Flash Attention Multimodal labels Oct 7, 2024

aroun-coumar requested a review from qubvel October 21, 2024 13:08

qubvel reviewed Oct 30, 2024

View reviewed changes

aroun-coumar added 8 commits October 31, 2024 09:41

Enabled Flash Attention for PaliGemma models

65b5f57

[run_slow] paligemma

21fcec3

Paligemma doc update

8393deb

[run_slow] paligemma

1e77c8f

[run_slow] paligemma

38cb392

[run_slow] paligemma

224c077

Paligemma doc update 3

69e4d50

[run_slow] paligemma

39ebb03

aroun-coumar force-pushed the enable-flash-attention-paligemma branch 3 times, most recently from 169500d to 39ebb03 Compare October 31, 2024 04:40

aroun-coumar and others added 3 commits October 31, 2024 10:14

Fixed implementation of Flash Attention in Paligemma model

afcd45f

Fixed implementation of Flash Attention in Paligemma model

289cfa7

Merge branch 'main' into enable-flash-attention-paligemma

dc1a077

qubvel reviewed Oct 31, 2024

View reviewed changes

Enabled Flash Attention for PaliGemma models #34009

Are you sure you want to change the base?

Enabled Flash Attention for PaliGemma models #34009

Uh oh!

Conversation

aroun-coumar commented Oct 7, 2024 • edited by qubvel Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Before submitting

Who can review?

Uh oh!

aroun-coumar commented Oct 7, 2024

Uh oh!

qubvel left a comment

Choose a reason for hiding this comment

Uh oh!

aroun-coumar commented Oct 8, 2024

Uh oh!

qubvel commented Oct 8, 2024

Uh oh!

aroun-coumar commented Oct 9, 2024

Uh oh!

HuggingFaceDocBuilderDev commented Oct 9, 2024

Uh oh!

qubvel commented Oct 9, 2024

Uh oh!

aroun-coumar commented Oct 10, 2024

Uh oh!

aroun-coumar commented Oct 21, 2024

Uh oh!

qubvel commented Oct 21, 2024

Uh oh!

qubvel left a comment

Choose a reason for hiding this comment

Uh oh!

qubvel left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

aroun-coumar commented Oct 7, 2024 •

edited by qubvel

Loading