Skip to content

extend bucket_internal to SAMPLE generation mode#819

Merged
regisss merged 1 commit into
huggingface:mainfrom
xt574chen:feat_extend_bucket_internal
Apr 26, 2024
Merged

extend bucket_internal to SAMPLE generation mode#819
regisss merged 1 commit into
huggingface:mainfrom
xt574chen:feat_extend_bucket_internal

Conversation

@xt574chen
Copy link
Copy Markdown
Contributor

What does this PR do?

REF: HabanaAI#84

Extend function #24 to sample mode.
image

The command to reproduce performance is as follows:
python ../gaudi_spawn.py --use_deepspeed --world_size 4 run_generation.py --model_name_or_path meta-llama/Llama-2-70b-hf --use_hpu_graphs --use_kv_cache --max_input_tokens 128 --max_new_tokens 2048 --batch_size 240 --attn_softmax_bf16 --trim_logits --bf16 --reuse_cache --warmup 1 --n_iterations 1 --limit_hpu_graphs --do_sample --bucket_size 256 --bucket_internal

@puneeshkhanna
Copy link
Copy Markdown
Contributor

puneeshkhanna commented Mar 22, 2024

Changes look good to me. Same as we have in greedy search and enables bucketing in sampling mode too.

@xt574chen
Copy link
Copy Markdown
Contributor Author

@regisss Could u help review and merge it?

@ssarkar2 ssarkar2 added the run-test Run CI for PRs from external contributors label Apr 22, 2024
@regisss regisss merged commit 155fe07 into huggingface:main Apr 26, 2024
ccrhx4 pushed a commit to ccrhx4/ccrhx4.optimum-habana that referenced this pull request May 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

run-test Run CI for PRs from external contributors

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants