Remove FSDP wrapping from sub-models. by eljandoubi · Pull Request #34452 · huggingface/transformers

eljandoubi · 2024-10-28T00:32:33Z

What does this PR do?

Fixes #34113

Who can review?

Library:

trainer: @muellerzr and @SunMarc

SunMarc

Thnaks for fixing the issue @eljandoubi ! Do you think there is a simpler way to handle this edge case @muellerzr ?

SunMarc · 2024-10-28T01:26:57Z

src/transformers/trainer.py

You can use unwarp_model function in transformers instead. Also, why do we need to set recursive to True ? Also, please leave a comment above as this specific path is only to make it functional with auto_find_batch_size .

unwrap_model does not provide access to the recursive argument. Auto-wrap policies wrap submodules with FSDP, and unwrap_model is unable to remove them. You can test this on the toy example from the PyTorch FSDP tutorial for rank=0 and world_size=1, then experiment with the line I provided in a notebook.

my_auto_wrap_policy = functools.partial( size_based_auto_wrap_policy, min_num_params=20000 ) torch.cuda.set_device(rank) model = Net().to(rank) print(model) fsdp_model = FSDP(model, auto_wrap_policy=my_auto_wrap_policy) print(fsdp_model) unwrap_model = unwarp_model(fsdp_model) print(unwrap_model)

VS
You need to reinstantiates model and fsdp_model:

model = Net().to(rank) fsdp_model = FSDP(model, auto_wrap_policy=my_auto_wrap_policy) extract_model = extract_model_from_parallel(fsdp_model, recursive=True) print(extract_model)

I'm talking about this function in transformers. It uses extract_model_from_parallel under the hood so it should be comparable.

eljandoubi · 2024-10-29T18:26:40Z

@SunMarc @muellerzr Did you get a different result than I did?

muellerzr

Thanks for the fix, can you add a test in tests/test_trainer.py for this?

HuggingFaceDocBuilderDev · 2024-10-30T22:45:44Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

SunMarc

Thanks ! Left an suggestion for unwrap_model

eljandoubi · 2024-11-04T19:08:31Z

@SunMarc I migrated to unwrap_model.

LysandreJik

Let's merge it if you're both ok with it @SunMarc @muellerzr

SunMarc · 2024-11-05T17:28:32Z

Please rebase this PR on main in order to pass the CI @eljandoubi !

eljandoubi · 2024-11-13T18:39:39Z

@SunMarc @LysandreJik @muellerzr Is there any update on the pull request?

ArthurZucker · 2024-11-15T21:59:59Z

We were on a company wide offsite! merging as they all approved 🤗

* Remove FSDP wrapping from sub-models. * solve conflict trainer.py * make fixup * add unit test for fsdp_auto_wrap_policy when using auto_find_batch_size * put back extract_model_from_parallel * use transformers unwrap_model

SunMarc reviewed Oct 28, 2024

View reviewed changes

muellerzr approved these changes Oct 30, 2024

View reviewed changes

muellerzr requested a review from ArthurZucker October 30, 2024 22:21

muellerzr requested a review from LysandreJik October 31, 2024 13:55

SunMarc approved these changes Nov 4, 2024

View reviewed changes

LysandreJik approved these changes Nov 5, 2024

View reviewed changes

ArthurZucker removed their request for review November 5, 2024 12:41

eljandoubi and others added 6 commits November 5, 2024 21:00

Remove FSDP wrapping from sub-models.

ab9934e

solve conflict trainer.py

d518f76

make fixup

e76dfd0

add unit test for fsdp_auto_wrap_policy when using auto_find_batch_size

a6dcb92

put back extract_model_from_parallel

7ebb2c6

use transformers unwrap_model

0df20d6

eljandoubi force-pushed the fix_fsdp_auto_wrap_policy branch from 693ba36 to 0df20d6 Compare November 5, 2024 20:07

ArthurZucker merged commit 8d50fda into huggingface:main Nov 15, 2024

Conversation

eljandoubi commented Oct 28, 2024

What does this PR do?

Who can review?

Uh oh!

SunMarc left a comment

Choose a reason for hiding this comment

Uh oh!

SunMarc Oct 28, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

eljandoubi Oct 28, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SunMarc Nov 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

eljandoubi Nov 4, 2024

Choose a reason for hiding this comment

Uh oh!

eljandoubi commented Oct 29, 2024

Uh oh!

muellerzr left a comment

Choose a reason for hiding this comment

Uh oh!

HuggingFaceDocBuilderDev commented Oct 30, 2024

Uh oh!

SunMarc left a comment

Choose a reason for hiding this comment

Uh oh!

eljandoubi commented Nov 4, 2024

Uh oh!

LysandreJik left a comment

Choose a reason for hiding this comment

Uh oh!

SunMarc commented Nov 5, 2024

Uh oh!

eljandoubi commented Nov 13, 2024

Uh oh!

ArthurZucker commented Nov 15, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

SunMarc Oct 28, 2024 •

edited

Loading

eljandoubi Oct 28, 2024 •

edited

Loading

SunMarc Nov 4, 2024 •

edited

Loading