Skip to content

Expose Llama Fused OPs control from run_lora_clm.py#23

Merged
2 commits merged into
habana-mainfrom
fusedllama_fused_ops
Feb 7, 2024
Merged

Expose Llama Fused OPs control from run_lora_clm.py#23
2 commits merged into
habana-mainfrom
fusedllama_fused_ops

Conversation

@vivekgoe
Copy link
Copy Markdown

@vivekgoe vivekgoe commented Feb 5, 2024

What does this PR do?

Exposes FusedRoPE enable/disable for Llama model from task scripts (currently done for only run_lora_clm.py). It is useful to have this capability for debugging purposes. Immediate motivation is to use this as a workaround for issue we see with FusedRoPE in compile mode.

Fixes # (issue)

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?

@vivekgoe vivekgoe marked this pull request as ready for review February 7, 2024 06:53
@vivekgoe vivekgoe requested a review from a user February 7, 2024 06:53
@vivekgoe
Copy link
Copy Markdown
Author

vivekgoe commented Feb 7, 2024

@dvarshney-habana please review, we have already reviewed it with Puneesh.

@ghost ghost merged commit e48398d into habana-main Feb 7, 2024
bhargaveede pushed a commit that referenced this pull request Feb 19, 2024
* Expose Llama Fused OPs control from run_lora_clm.py

* Update as per review comments
bhargaveede pushed a commit that referenced this pull request Feb 19, 2024
* Expose Llama Fused OPs control from run_lora_clm.py

* Update as per review comments
dudilester pushed a commit that referenced this pull request Feb 29, 2024
* Expose Llama Fused OPs control from run_lora_clm.py

* Update as per review comments
@vivekgoe vivekgoe deleted the fusedllama_fused_ops branch March 2, 2024 06:20
hlahkar pushed a commit that referenced this pull request Mar 3, 2024
* Expose Llama Fused OPs control from run_lora_clm.py

* Update as per review comments
@vivekgoe vivekgoe added the ported_to_hf_oh PR has been ported to huggingface/optimum-habana label Mar 4, 2024
bhargaveede pushed a commit that referenced this pull request Mar 8, 2024
Expose Llama Fused OPs control from run_lora_clm.py (#23)

* Expose Llama Fused OPs control from run_lora_clm.py

* Update as per review comments

Co-authored-by: Vivek Goel <vgoel@habana.ai>
puneeshkhanna pushed a commit to puneeshkhanna/optimum-habana-fork that referenced this pull request Mar 11, 2024
Expose Llama Fused OPs control from run_lora_clm.py (HabanaAI#23)

* Expose Llama Fused OPs control from run_lora_clm.py

* Update as per review comments

Co-authored-by: Vivek Goel <vgoel@habana.ai>
HolyFalafel pushed a commit that referenced this pull request Mar 11, 2024
Expose Llama Fused OPs control from run_lora_clm.py (#23)

* Expose Llama Fused OPs control from run_lora_clm.py

* Update as per review comments

Co-authored-by: Vivek Goel <vgoel@habana.ai>
kalyanjk pushed a commit to kalyanjk/optimum-habana-fork that referenced this pull request Apr 12, 2024
* Expose Llama Fused OPs control from run_lora_clm.py

* Update as per review comments
kalyanjk pushed a commit to kalyanjk/optimum-habana-fork that referenced this pull request Apr 15, 2024
* Expose Llama Fused OPs control from run_lora_clm.py

* Update as per review comments
@astachowiczhabana
Copy link
Copy Markdown

huggingface#751

astachowiczhabana added a commit that referenced this pull request Nov 22, 2024
* Fix clip test

* Skip falcon tests

* Fix clip test

* [SW-209062] Disable default sdpa in Albert (#23)

Transformers' default sdpa implementation caused performance
drop in Albert. Adding Albert to the list of models which don't
yet have sdpa implementation in Gaudi and use eager attention.

* [SW-209210] skip first token in EOS check. (#25) (#27)

* Problem: output of _sample function was filled with padding tokens
   for for bart model.

 * Cause: Bart model uses the same token as decoder_start_token_id and
   end of string.
   See: https://huggingface.co/facebook/bart-large-cnn/blob/main/config.json
   Because of that mechanism filling model output with padding
   tokens after EOS (end of string) toke was replacing whole response
   with padding.

 * Solution: Skip check for EOS for first token in padding filling loop.

* Update CODEOWNERS

* Adding labels clone as workaround to avoid crash (#28)

* [SW-0] Fix style

---------

Co-authored-by: Urszula Golowicz <urszula.golowicz@intel.com>
Co-authored-by: Marcin Łapiński <mlapinskix@habana.ai>
Co-authored-by: Bhargav <beede@habana.ai>
xinyu-intel pushed a commit that referenced this pull request Mar 4, 2025
* Fix clip test

* Skip falcon tests

* Fix clip test

* [SW-209062] Disable default sdpa in Albert (#23)

Transformers' default sdpa implementation caused performance
drop in Albert. Adding Albert to the list of models which don't
yet have sdpa implementation in Gaudi and use eager attention.

* [SW-209210] skip first token in EOS check. (#25) (#27)

* Problem: output of _sample function was filled with padding tokens
   for for bart model.

 * Cause: Bart model uses the same token as decoder_start_token_id and
   end of string.
   See: https://huggingface.co/facebook/bart-large-cnn/blob/main/config.json
   Because of that mechanism filling model output with padding
   tokens after EOS (end of string) toke was replacing whole response
   with padding.

 * Solution: Skip check for EOS for first token in padding filling loop.

* Update CODEOWNERS

* Adding labels clone as workaround to avoid crash (#28)

* [SW-0] Fix style

---------

Co-authored-by: Urszula Golowicz <urszula.golowicz@intel.com>
Co-authored-by: Marcin Łapiński <mlapinskix@habana.ai>
Co-authored-by: Bhargav <beede@habana.ai>
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ported_to_hf_oh PR has been ported to huggingface/optimum-habana

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants