Skip to content

extend bucket_internal to SAMPLE generation mode#84

Merged
4 commits merged into
HabanaAI:habana-mainfrom
xt574chen:extend_bucket_internal
Mar 2, 2024
Merged

extend bucket_internal to SAMPLE generation mode#84
4 commits merged into
HabanaAI:habana-mainfrom
xt574chen:extend_bucket_internal

Conversation

@xt574chen
Copy link
Copy Markdown

What does this PR do?

Extend function #24 to sample mode.
image

The command to reproduce performance is as follows:
python ../gaudi_spawn.py --use_deepspeed --world_size 4 run_generation.py --model_name_or_path meta-llama/Llama-2-70b-hf --use_hpu_graphs --use_kv_cache --max_input_tokens 128 --max_new_tokens 2048 --batch_size 240 --attn_softmax_bf16 --trim_logits --bf16 --reuse_cache --warmup 1 --n_iterations 1 --limit_hpu_graphs --do_sample --bucket_size 256 --bucket_internal

if generation_config.static_shapes and generation_config.bucket_size > 0:
assert (
generation_mode == GenerationMode.GREEDY_SEARCH or generation_mode == GenerationMode.BEAM_SEARCH
generation_mode == GenerationMode.GREEDY_SEARCH or generation_mode == GenerationMode.SAMPLE
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets have the check of BEAM SEARCH too since bucketing changes from Sayantan works in beam search.

model_kwargs["lazy_mode"] = lazy_mode
model_inputs = self.prepare_inputs_for_generation(input_ids, **model_kwargs)

if bucket_size > 0 and bucket_internal:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes should be after model fwd call.

Comment thread optimum/habana/transformers/generation/utils.py Outdated
@puneeshkhanna
Copy link
Copy Markdown

puneeshkhanna commented Mar 1, 2024

@dvarshney-habana - Changes look good to me. We can merge so that sampling search starts working with bucket_internal.
Thanks @xt574chen.

@ghost ghost self-requested a review March 2, 2024 09:15
@ghost ghost merged commit 348e8be into HabanaAI:habana-main Mar 2, 2024
astachowiczhabana pushed a commit that referenced this pull request Apr 19, 2024
* extend bucket_internal to SAMPLE generation mode

* 1. copy bucket only related code from greedy to sample
2. move internal bucket update after forward

* fix format

* remove clear_cache
astachowiczhabana pushed a commit that referenced this pull request Apr 22, 2024
* extend bucket_internal to SAMPLE generation mode

* 1. copy bucket only related code from greedy to sample
2. move internal bucket update after forward

* fix format

* remove clear_cache
astachowiczhabana pushed a commit that referenced this pull request Apr 24, 2024
* extend bucket_internal to SAMPLE generation mode

* 1. copy bucket only related code from greedy to sample
2. move internal bucket update after forward

* fix format

* remove clear_cache
astachowiczhabana pushed a commit that referenced this pull request Apr 24, 2024
* extend bucket_internal to SAMPLE generation mode

* 1. copy bucket only related code from greedy to sample
2. move internal bucket update after forward

* fix format

* remove clear_cache
@astachowiczhabana
Copy link
Copy Markdown

huggingface#720

This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants