extend bucket_internal to SAMPLE generation mode by xt574chen · Pull Request #84 · HabanaAI/optimum-habana-fork

xt574chen · 2024-03-01T04:29:34Z

What does this PR do?

Extend function #24 to sample mode.

The command to reproduce performance is as follows:
python ../gaudi_spawn.py --use_deepspeed --world_size 4 run_generation.py --model_name_or_path meta-llama/Llama-2-70b-hf --use_hpu_graphs --use_kv_cache --max_input_tokens 128 --max_new_tokens 2048 --batch_size 240 --attn_softmax_bf16 --trim_logits --bf16 --reuse_cache --warmup 1 --n_iterations 1 --limit_hpu_graphs --do_sample --bucket_size 256 --bucket_internal

puneeshkhanna · 2024-03-01T05:51:47Z

        if generation_config.static_shapes and generation_config.bucket_size > 0:
            assert (
-                generation_mode == GenerationMode.GREEDY_SEARCH or generation_mode == GenerationMode.BEAM_SEARCH
+                generation_mode == GenerationMode.GREEDY_SEARCH or generation_mode == GenerationMode.SAMPLE


Lets have the check of BEAM SEARCH too since bucketing changes from Sayantan works in beam search.

puneeshkhanna · 2024-03-01T05:54:28Z

            model_kwargs["lazy_mode"] = lazy_mode
            model_inputs = self.prepare_inputs_for_generation(input_ids, **model_kwargs)

+            if bucket_size > 0 and bucket_internal:


These changes should be after model fwd call.

2. move internal bucket update after forward

puneeshkhanna · 2024-03-01T07:54:33Z

@dvarshney-habana - Changes look good to me. We can merge so that sampling search starts working with bucket_internal.
Thanks @xt574chen.

* extend bucket_internal to SAMPLE generation mode * 1. copy bucket only related code from greedy to sample 2. move internal bucket update after forward * fix format * remove clear_cache

astachowiczhabana · 2024-06-11T12:01:18Z

huggingface#720

extend bucket_internal to SAMPLE generation mode

00f4481

xt574chen requested review from bhargaveede, ssarkar2 and vivekgoe as code owners March 1, 2024 04:29

puneeshkhanna reviewed Mar 1, 2024

View reviewed changes

xt574chen added 2 commits March 1, 2024 14:57

1. copy bucket only related code from greedy to sample

0452c50

2. move internal bucket update after forward

fix format

9e80980

puneeshkhanna reviewed Mar 1, 2024

View reviewed changes

Comment thread optimum/habana/transformers/generation/utils.py Outdated

remove clear_cache

f935d04

ghost self-requested a review March 2, 2024 09:15

ghost approved these changes Mar 2, 2024

View reviewed changes

ghost merged commit 348e8be into HabanaAI:habana-main Mar 2, 2024

xt574chen mentioned this pull request Mar 19, 2024

extend bucket_internal to SAMPLE generation mode huggingface/optimum-habana#819

Merged

This pull request was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

extend bucket_internal to SAMPLE generation mode#84

extend bucket_internal to SAMPLE generation mode#84
4 commits merged into
HabanaAI:habana-mainfrom
xt574chen:extend_bucket_internal

xt574chen commented Mar 1, 2024

Uh oh!

puneeshkhanna Mar 1, 2024

Uh oh!

puneeshkhanna Mar 1, 2024

Uh oh!

Uh oh!

puneeshkhanna commented Mar 1, 2024 •

edited

Loading

Uh oh!

astachowiczhabana commented Jun 11, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

xt574chen commented Mar 1, 2024

What does this PR do?

Uh oh!

puneeshkhanna Mar 1, 2024

Choose a reason for hiding this comment

Uh oh!

puneeshkhanna Mar 1, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

puneeshkhanna commented Mar 1, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

astachowiczhabana commented Jun 11, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

puneeshkhanna commented Mar 1, 2024 •

edited

Loading