-
Notifications
You must be signed in to change notification settings - Fork 31.9k
Fix FSDP + llava-next/llava-onevision #38141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Fix FSDP + llava-next/llava-onevision #38141
Conversation
|
Hi 👋, thank you for opening this pull request! The pull request is converted to draft by default. The CI will be paused while the PR is in draft mode. When it is ready for review, please click the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense to me! Though I would expect _no_split_modules to be correctly propagated from the module's children. Since llava can support any vision/LM backbone, we can't list all possible model names here
A scalable solution will be to check why CLIP _no_split_modules didn't get propagated, since I remember a while ago it worked fine. @Arvin-xiong would you like to investigate it further?
I conducted some research, for llava-next, The I have some doubts that this is a legacy issue from before, but if there is any other relevant information that can be provided to me, i think i will continue to investigate it further. |
|
@Arvin-xiong ah, that comes from a different repo then, because afaik this code block below is able to grab all class attributes when loading, and is what we use for splitting layers per device in multi-gpu inference. I hope this helps you and probably we need to update PEFT in this case, so that each transformers/src/transformers/modeling_utils.py Lines 2677 to 2688 in 1e921a3
|
|
cc @BenjaminBossan from PEFT as well |
|
Thanks for your patient reply. |
|
I tried to understand the issue but I'm not very familiar with the mechanics of |
|
@BenjaminBossan the So I think recursing over |
Okay, so here, instead of just checking on the model, all children would need to be visited: |
See discussion in huggingface/transformers#38141 for context. In the PEFT fsdp_auto_wrap policy, we determine the _no_split_modules. However, this currently neglects to visit the children of the model, which can be required for some architectures. This PR fixes that. Note that the _get_no_split_modules function is largely copied from transformers. One change is that it doesn't take the device_map argument. That argument is used in transformers inside an error message but not for the logic proper. I think it's safe to remove. Morever, I made an unrelated change to fsdp_auto_wrap_policy, namely making local imports global (there was no reason for them to be local).
|
I created a PR which I think should address the issue: huggingface/peft#2570. Could you please check? |
See discussion in huggingface/transformers#38141 for context. In the PEFT fsdp_auto_wrap policy, we determine the _no_split_modules. However, this currently neglects to visit the children of the model, which can be required for some architectures. This PR fixes that. Note that the _get_no_split_modules function is largely copied from transformers. One change is that it doesn't take the device_map argument. That argument is used in transformers inside an error message but not for the logic proper. I think it's safe to remove. Morever, I made an unrelated change to fsdp_auto_wrap_policy, namely making local imports global (there was no reason for them to be local).
See discussion in huggingface/transformers#38141 for context. In the PEFT fsdp_auto_wrap policy, we determine the _no_split_modules. However, this currently neglects to visit the children of the model, which can be required for some architectures. This PR fixes that. Note that the _get_no_split_modules function is largely copied from transformers. One change is that it doesn't take the device_map argument. That argument is used in transformers inside an error message but not for the logic proper. I think it's safe to remove. Morever, I made an unrelated change to fsdp_auto_wrap_policy, namely making local imports global (there was no reason for them to be local).
Recently, I have been trying to use FSDP to train
llava_next, but i need to find the minimum spliting module, which is the_no_split_modulesin the code. However, i did not find the corresponding class implementation in themodeling_llava_next.py. (LlavaNextVisionAttentionin v4.51.3 andLlamaDecoderLayerin latest code)