Skip to content

Fast softmax#972

Merged
regisss merged 7 commits into
huggingface:mainfrom
HabanaAI:fast_softmax
Jun 6, 2024
Merged

Fast softmax#972
regisss merged 7 commits into
huggingface:mainfrom
HabanaAI:fast_softmax

Conversation

@wszczurekhabana
Copy link
Copy Markdown
Contributor

Changes from: HabanaAI#159
This change is dependent on: #967 to be merged first.

Original description:

Support for setting fast softmax mode in FusedSDPA operator. This is a tradeoff: performance vs accuracy.

Data on performance:

Ratio Max input tokens Max new tokens Batch size Throughput without fast softmax [tokens/s] Throughput with fast softmax [tokens/s] Improvement %
97% 31744 1042 12 139.08 147.97 6.4%
75% 24576 8192 16 431.09 437.95 1.6%
50% 16384 16384 24 653.39 656.38 0.5%

Data on accuracy (using mlperf test from: https://gerrit.habana-labs.com/plugins/gitiles/mlperf_inference/+/refs/heads/master_next/code/llama/llama_greedy.py
and https://gerrit.habana-labs.com/plugins/gitiles/mlperf_inference/+/refs/heads/master_next/code/llama/evaluation.py):

  rouge1 rouge2 rougeL rougeLsum accuracy
without fast softmax 44.4279 22.0536 28.6362 42.0044 99.99
with fast softmax 44.4065 22.0229 28.6156 41.9858 99.94

dudilester and others added 5 commits May 8, 2024 16:10
* Done to allow quantization using HQT

* Added use_flash_attention and flash_attention_recompute to run_lm_eval
* Enable fast softmax mode in FusedSDPA

* Add fast_softmax parameter to _gradient_checkpointing_func
@wszczurekhabana wszczurekhabana requested a review from a user May 10, 2024 12:03
@wszczurekhabana wszczurekhabana requested a review from regisss as a code owner May 10, 2024 12:03
@libinta libinta added the synapse 1.16_dependency synapse 1.16 dependency label May 10, 2024
Copy link
Copy Markdown

@ghost ghost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wszczurekhabana Pls confirm we can merge this now as #967 is merged.

hsubramony added a commit that referenced this pull request May 29, 2024
Copy link
Copy Markdown
Collaborator

@regisss regisss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!
Please run make style.

@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

hsubramony added a commit that referenced this pull request May 31, 2024
@regisss regisss merged commit adcec3d into huggingface:main Jun 6, 2024
imangohari1 pushed a commit to imangohari1/optimum-habana that referenced this pull request Jun 13, 2024
Co-authored-by: Dudi Lester <160421192+dudilester@users.noreply.github.com>
Co-authored-by: Sayantan Sarkar <sasarkar@habana.ai>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

synapse 1.16_dependency synapse 1.16 dependency

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants