Enable fast softmax mode in FusedSDPA by wszczurekhabana · Pull Request #159 · HabanaAI/optimum-habana-fork

wszczurekhabana · 2024-04-11T09:59:58Z

Support for setting fast softmax mode in FusedSDPA operator. This is a tradeoff: performance vs accuracy.

Data on performance:

Ratio	Max input tokens	Max new tokens	Batch size	Throughput without fast softmax [tokens/s]	Throughput with fast softmax [tokens/s]	Improvement %
97%	31744	1042	12	139.08	147.97	6.4%
75%	24576	8192	16	431.09	437.95	1.6%
50%	16384	16384	24	653.39	656.38	0.5%

Data on accuracy (using mlperf test from: https://gerrit.habana-labs.com/plugins/gitiles/mlperf_inference/+/refs/heads/master_next/code/llama/llama_greedy.py
and https://gerrit.habana-labs.com/plugins/gitiles/mlperf_inference/+/refs/heads/master_next/code/llama/evaluation.py):

	rouge1	rouge2	rougeL	rougeLsum	accuracy
without fast softmax	44.4279	22.0536	28.6362	42.0044	99.99
with fast softmax	44.4065	22.0229	28.6156	41.9858	99.94

dudilester · 2024-04-15T08:49:13Z

This ModuleFusedSDPA forward API change will require changes in the HQT patched module for quantization. which means it will break the nightly testing once merged. Im not sure we support regular-softmax for 8bit, we need to consider the appropriate behavior when user requests both quantization and regular-softmax, should we ignore the quantization or the softmax? or assert on that configuration.

wszczurekhabana · 2024-04-15T18:22:30Z

Discussed offline. Relevant change for quantization toolkit: https://gerrit.habana-labs.com/#/c/411008/ pushed by @dudilester is in review.

wszczurekhabana · 2024-04-18T06:38:25Z

Change in https://gerrit.habana-labs.com/#/c/411008/ is merged. @dvarshney-habana @puneeshkhanna @dudilester I think we can merge this PR now.

dudilester · 2024-04-18T06:56:00Z

Change in https://gerrit.habana-labs.com/#/c/411008/ did not pass promotion yet, we need to wait till it will pass before we merge this PR.

dudilester · 2024-05-02T11:42:21Z

FYI, commit https://gerrit.habana-labs.com/#/c/411008/ was promoted since my previous comment, and is included in builds since CD 1.16.0-328 release build

wszczurekhabana · 2024-05-02T11:47:58Z

Thanks, I was not tracking it closely. @dvarshney-habana can we merge it?

* Enable fast softmax mode in FusedSDPA * Add fast_softmax parameter to _gradient_checkpointing_func

wszczurekhabana · 2024-06-11T07:42:36Z

upstreamed in: huggingface#972

Enable fast softmax mode in FusedSDPA

812b3ee

wszczurekhabana requested review from bhargaveede, libinta, mandy-li, ssarkar2 and vivekgoe as code owners April 11, 2024 09:59

wszczurekhabana requested a review from a user April 11, 2024 09:59

ghost approved these changes Apr 15, 2024

View reviewed changes

ghost requested a review from MrGeva April 15, 2024 07:27

puneeshkhanna reviewed Apr 15, 2024

View reviewed changes

Comment thread optimum/habana/transformers/models/llama/modeling_llama.py

Merge branch 'HabanaAI:habana-main' into fast_softmax

94b2cbe

Add fast_softmax parameter to _gradient_checkpointing_func

e1cd8e0

ghost merged commit 8405798 into HabanaAI:habana-main May 2, 2024

astachowiczhabana pushed a commit that referenced this pull request May 6, 2024

Enable fast softmax mode in FusedSDPA (#159)

85e52d4

* Enable fast softmax mode in FusedSDPA * Add fast_softmax parameter to _gradient_checkpointing_func

wszczurekhabana added a commit that referenced this pull request May 10, 2024

Enable fast softmax mode in FusedSDPA (#159)

791b42a

* Enable fast softmax mode in FusedSDPA * Add fast_softmax parameter to _gradient_checkpointing_func

wszczurekhabana mentioned this pull request May 10, 2024

Fast softmax huggingface/optimum-habana#972

Merged

astachowiczhabana pushed a commit that referenced this pull request Feb 14, 2025

Adding missing comma (#159)

610cd79

xinyu-intel pushed a commit that referenced this pull request Mar 4, 2025

Adding missing comma (#159)

49f4ae5

This pull request was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable fast softmax mode in FusedSDPA#159

Enable fast softmax mode in FusedSDPA#159
3 commits merged into
HabanaAI:habana-mainfrom
wszczurekhabana:fast_softmax

wszczurekhabana commented Apr 11, 2024

Uh oh!

Uh oh!

dudilester commented Apr 15, 2024

Uh oh!

wszczurekhabana commented Apr 15, 2024

Uh oh!

wszczurekhabana commented Apr 18, 2024

Uh oh!

dudilester commented Apr 18, 2024

Uh oh!

dudilester commented May 2, 2024

Uh oh!

wszczurekhabana commented May 2, 2024

Uh oh!

wszczurekhabana commented Jun 11, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

wszczurekhabana commented Apr 11, 2024

Uh oh!

Uh oh!

dudilester commented Apr 15, 2024

Uh oh!

wszczurekhabana commented Apr 15, 2024

Uh oh!

wszczurekhabana commented Apr 18, 2024

Uh oh!

dudilester commented Apr 18, 2024

Uh oh!

dudilester commented May 2, 2024

Uh oh!

wszczurekhabana commented May 2, 2024

Uh oh!

wszczurekhabana commented Jun 11, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants