Add changes to support FSDP#598
Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
regisss
left a comment
There was a problem hiding this comment.
I left a few comments. For the Transformers-related changes, it seems you based them on a newer version (main?). Let's just stick to v4.34.1 for now as newer releases are not supported 🙂
Can you also format the code as follows?
pip install --upgrade ruff
make style
| from optimum.habana import GaudiConfig, GaudiTrainer, GaudiTrainingArguments | ||
| from optimum.habana.utils import set_seed | ||
|
|
||
| from optimum.habana.peft.layer import GaudiLoraLayerLinearforward |
There was a problem hiding this comment.
Should we do that everytime we use FSDP and LoRA? Or is it necessary for this example only?
If it's always necessary, it's better to do that in optimum.habana.transformers.modeling_utils.py.
There was a problem hiding this comment.
We need to do this every time we use LoRA with torch_compile enabled (see detailed explanation I added in response to one of the other comments from you).
I can move it to optimum.habana.transformers.modeling_utils.py, its ok to do it unconditionally, right?
There was a problem hiding this comment.
I think it's okay yes. Maybe LoRA inference in the text-generation example could be impacted, I'll check that.
There was a problem hiding this comment.
@regisss My understanding was that LoRA is used only for finetuning. But please check and resolve this conversation when you get a chance. Thanks.
|
@regisss any suggestions on where in code should I add a warning that FSDP is an experimental feature not ready for use yet? How can I add test_fsdp_examples.py to list of tests you run as part of your regular testing? |
|
@vivekgoe When running the FSDP test on Gaudi2 with 1.13, I get errors like And on Gaudi1 with 1.14 I get I launched the test with Is there something I'm doing wrong? |
What does this PR do?
Add changes to support FSDP. BERT-Base enabled with FSDP as a toy example.
Fixes # (issue)
Before submitting