Skip to content

Expose Llama Fused OPs control from run_lora_clm.py#751

Merged
vivekgoe merged 2 commits into
huggingface:mainfrom
HabanaAI:expose_fused_ops
Mar 7, 2024
Merged

Expose Llama Fused OPs control from run_lora_clm.py#751
vivekgoe merged 2 commits into
huggingface:mainfrom
HabanaAI:expose_fused_ops

Conversation

@hlahkar
Copy link
Copy Markdown
Contributor

@hlahkar hlahkar commented Mar 3, 2024

Provide variable to enable/disable FusedRoPE. This may be useful for integrating new models or configurations where FusedRoPE may not work out of the box.

* Expose Llama Fused OPs control from run_lora_clm.py

* Update as per review comments
@hlahkar hlahkar requested a review from a user March 3, 2024 09:48
@hlahkar hlahkar requested a review from regisss as a code owner March 3, 2024 09:48
@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@hlahkar hlahkar changed the title Expose Llama Fused OPs control from run_lora_clm.py (#23) Expose Llama Fused OPs control from run_lora_clm.py Mar 3, 2024
Copy link
Copy Markdown
Collaborator

@vivekgoe vivekgoe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add use-case(s) in description where we may need to switch off "fused rope" (say for debugging potential accuracy issues or if fused rope does not work for a new input configuration).

@vivekgoe vivekgoe added run-test Run CI for PRs from external contributors synapse 1.15 labels Mar 4, 2024
@vivekgoe vivekgoe added run-test Run CI for PRs from external contributors and removed run-test Run CI for PRs from external contributors labels Mar 7, 2024
@vivekgoe vivekgoe merged commit 1a0c775 into huggingface:main Mar 7, 2024
@regisss
Copy link
Copy Markdown
Collaborator

regisss commented Mar 7, 2024

@vivekgoe Why was this PR merged? I'm not convinced this is a good way to managed FusedRoPE, it should be enabled or disabled automatically in the modeling file of each model IMO.

puneeshkhanna pushed a commit to puneeshkhanna/optimum-habana-fork that referenced this pull request Mar 11, 2024
Expose Llama Fused OPs control from run_lora_clm.py (HabanaAI#23)

* Expose Llama Fused OPs control from run_lora_clm.py

* Update as per review comments

Co-authored-by: Vivek Goel <vgoel@habana.ai>
HolyFalafel pushed a commit to HabanaAI/optimum-habana-fork that referenced this pull request Mar 11, 2024
Expose Llama Fused OPs control from run_lora_clm.py (#23)

* Expose Llama Fused OPs control from run_lora_clm.py

* Update as per review comments

Co-authored-by: Vivek Goel <vgoel@habana.ai>
@vivekgoe
Copy link
Copy Markdown
Collaborator

@regisss we need a way to disable FusedRoPE for corner cases where it causes a problem with a new feature. Ideally we would like to fix the problem with the FusedRoPE operator itself but there are cases where we would like to release feature with a work-around (disable FusedRoPE) without waiting for fix.

Current motivation is failure we see with FusedRoPE during Llama-70B-FSDP evaluation phase. If there is another better way you see to disable FusedRoPE for particular feature or test-case then please do let us know. We will revert this change and push a new one as per your recommendation.

@regisss
Copy link
Copy Markdown
Collaborator

regisss commented Mar 13, 2024

@vivekgoe Thanks for the explanation. Since this kind of workaround is meant to be temporary, I would directly access self.generation_config.use_fused_rope from the forward methods in the modeling. That way, we can remove use_fused_rope from the forward signatures (it will still be accessible through self.generation_config) and also remove the logic added in the trainer.
I would also not add a new arg to apply_customized_rope and rather set FusedRoPE = None directly as it is done here for instance: https://github.com/huggingface/optimum-habana/pull/746/files#diff-bae02284f455b93397dbeafee178bd779671429602246f3ba60ea833a538eb68R35
The goal of all this is to keep changes minimal since this is rather a short-term workaround than a long-term fix. It will be easier to revert when the fix is available.

@vivekgoe
Copy link
Copy Markdown
Collaborator

@regisss Thanks for feedback. Let me explore your suggestion regarding self.generation_config.use_fused_rope and check if it works for our use-case.
Regarding not changing arg to apply_customized_rope, If I understand correctly we apply change done in modeling_gpt_neoX for modeling_llama, it will switch off FusedRoPE for all use-cases (including inference), whereas we would like to switch it off for only 1 use-case we are failing and leave rest untouched.

@regisss
Copy link
Copy Markdown
Collaborator

regisss commented Mar 13, 2024

The example of GPT-NeoX is just to illustrate that we can modify FusedRoPE directly instead of inserting a new variable. You could use something such as:

if self.generation_config.use_fused_rope == False:
    FusedRoPE = None

That way it will be disabled only if use_fused_rope is explicitly set to False in the generation config.

@vivekgoe
Copy link
Copy Markdown
Collaborator

@regisss Thanks for explaining, we will try to make the required changes and get those in before next major release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

run-test Run CI for PRs from external contributors synapse 1.15

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants