Skip to content

Enable fused rmsnorm in bf16 for llama#621

Merged
regisss merged 1 commit into
huggingface:mainfrom
puneeshkhanna:rmsnorm_bf16
Jan 3, 2024
Merged

Enable fused rmsnorm in bf16 for llama#621
regisss merged 1 commit into
huggingface:mainfrom
puneeshkhanna:rmsnorm_bf16

Conversation

@puneeshkhanna
Copy link
Copy Markdown
Contributor

What does this PR do?

Fixes # (issue)

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?

@puneeshkhanna puneeshkhanna requested a review from a user January 3, 2024 10:16
@puneeshkhanna
Copy link
Copy Markdown
Contributor Author

puneeshkhanna commented Jan 3, 2024

@regisss - please review. We can enable fused rmsnorm in lower precision too and this gives a boost in performance too.

Command -> python ../gaudi_spawn.py --use_deepspeed --world_size 8 run_generation.py --model_name_or_path /software/data/llama_inference/Llama-2-70b-hf/ --max_new_tokens ?? --bf16 --n_iterations 3 --use_hpu_graphs --use_kv_cache --batch_size ?? --reuse_cache --limit_hpu_graphs --trim_logits --warmup 2 --attn_softmax_bf16

See below table for improved perf results:

<style> </style>
70B 8x BS Max new tokens Default Perf Perf with rmsnorm fix % Improvement over default perf
  1 100 55.7 57.995 3.68
  40 100 1893.22 1969.95 3.5
  1 2048 60.53 62.19 2.58
  40 2048 1686.35 1726.47 2.5
  60 2048 2207.2 2256.79 2.2
  1 4096 59.825 61.4 2.48
  40 4096 1366.72 1393.36 1.95
  60 4096 1688.21 1716.64 1.74
7B 1x 1 4096 124.147 126.8 2.15
  4 4096 354.48 357.28 0.8
13B 1x 1 4096 68.86 69.79 1.37
  4 4096 203.77 204.89 0.56

@regisss regisss added the run-test Run CI for PRs from external contributors label Jan 3, 2024
@puneeshkhanna puneeshkhanna changed the title Enable fused rmsnorm in bf16 Enable fused rmsnorm in bf16 for llama Jan 3, 2024
@bgoldberg-habana
Copy link
Copy Markdown
Contributor

LGTM, was also verified in FP8 runs.

@bgoldberg-habana bgoldberg-habana self-requested a review January 3, 2024 10:24
@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Copy Markdown
Collaborator

@regisss regisss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Does it generate the same outputs as before?

@mandy-li
Copy link
Copy Markdown
Collaborator

mandy-li commented Jan 3, 2024

@regisss , let me check finetuning as well for perf and accuracy.

@regisss regisss merged commit b72d8ea into huggingface:main Jan 3, 2024
@puneeshkhanna
Copy link
Copy Markdown
Contributor Author

@mandy-li - thanks for checking finetuning. I guess there were no issues there too. Did perf also improve in finetuning too ?
@regisss - thanks for merging.

@puneeshkhanna puneeshkhanna deleted the rmsnorm_bf16 branch January 4, 2024 04:29
MrGeva pushed a commit to HabanaAI/optimum-habana-fork that referenced this pull request Feb 4, 2024
jychen21 pushed a commit to jychen21/optimum-habana that referenced this pull request Feb 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

run-test Run CI for PRs from external contributors

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants