Fix for eos in eager mode by 12010486 · Pull Request #2197 · huggingface/optimum-habana

12010486 · 2025-08-08T10:47:18Z

A fast repro is as below:

PT_HPU_LAZY_MODE=0 python run_generation.py --model_name_or_path meta-llama/Llama-3.2-1B-Instruct --use_kv_cache --max_new_tokens 100 --do_sample --batch_size 2 --prompt "Hello world" "How are you?" --sdp_on_bf16 --no-ignore_eos

Which would have failed without this PR.

We get:

Input/outputs:
input 1: ('Hello world',)
output 1.1: ("Hello world! This is my first time using this platform. I am excited to share my knowledge and experiences with the world. I am a student of computer science and engineering, and I am currently working on my master's degree.\n\nMy area of interest is in artificial intelligence, machine learning, and deep learning. I have been studying these topics for several years and have gained a good understanding of the concepts and techniques involved. I have also been experimenting with various deep learning frameworks such as TensorFlow and PyTorch.\n\n",)

input 2: ('How are you?',)
output 2.1: ("How are you? Today is a beautiful day!\nI'm doing great, thanks for asking! It's a lovely day to be outside, isn't it? I love days like this, when the sun is shining and the birds are singing. There's something so uplifting about being in nature, don't you think?\n\nI've been meaning to get out and enjoy the great outdoors, but I've been stuck inside a lot lately. How about you? What are you up to today? Do you have any fun plans",)


Stats:
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Input tokens
Throughput (including tokenization) = 108.17886215980566 tokens/second
Average first token latency         = 22.46260239626281 ms
Average rest token latency          = 18.413477945447234 ms
Average end to end latency          = 1848.2638252025936 ms
Memory allocated                    = 5.62 GB
Max memory allocated                = 5.76 GB
Total memory available              = 94.62 GB
Graph compilation duration          = 5.586424403998535 seconds
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

--no-ignore_eos is just slightly slower than --ignore_eos in eager, 113.073 vs 108.179 tokens/second. This is expected

HuggingFaceDocBuilderDev · 2025-08-08T10:51:07Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

yafshar

LGTM!

regisss

LGTM

Co-authored-by: Silvia Colabrese <silvia.colabrese@intel.com>

Fix for eos in eager mode

abbb487

12010486 requested a review from vivekgoe as a code owner August 8, 2025 10:47

12010486 mentioned this pull request Aug 8, 2025

lm_eval to 0.4.9.1 and support for new args #2193

Closed

yafshar reviewed Aug 8, 2025

View reviewed changes

Comment thread optimum/habana/transformers/generation/utils.py Outdated

12010486 requested a review from yafshar August 8, 2025 14:08

yafshar approved these changes Aug 26, 2025

View reviewed changes

Comment thread optimum/habana/transformers/generation/utils.py Outdated

12010486 added 2 commits August 26, 2025 16:29

Merge branch 'huggingface:main' into fix_eager

be860f6

Fix perf

c571332

12010486 requested a review from regisss August 28, 2025 17:06

karol-brejna-i self-assigned this Sep 4, 2025

regisss approved these changes Sep 11, 2025

View reviewed changes

regisss merged commit b4f1710 into huggingface:main Sep 11, 2025
2 of 5 checks passed

astachowiczhabana pushed a commit that referenced this pull request Sep 12, 2025

Fix for eos in eager mode (#2197)

48c5834

astachowiczhabana pushed a commit that referenced this pull request Sep 17, 2025

Fix for eos in eager mode (#2197)

abb71e9

regisss mentioned this pull request Sep 29, 2025

Add Qwen2.5-VL #2235

Merged

3 tasks

gplutop7 pushed a commit to HabanaAI/optimum-habana-fork that referenced this pull request Oct 15, 2025

Fix for eos in eager mode (huggingface#2197) (huggingface#676)

1c605c9

Co-authored-by: Silvia Colabrese <silvia.colabrese@intel.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix for eos in eager mode#2197

Fix for eos in eager mode#2197
regisss merged 3 commits into
huggingface:mainfrom
12010486:fix_eager

12010486 commented Aug 8, 2025 •

edited

Loading

Uh oh!

HuggingFaceDocBuilderDev commented Aug 8, 2025

Uh oh!

Uh oh!

yafshar left a comment

Uh oh!

Uh oh!

regisss left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

12010486 commented Aug 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Aug 8, 2025

Uh oh!

Uh oh!

yafshar left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

regisss left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

12010486 commented Aug 8, 2025 •

edited

Loading