Skip to content

Fix for eos in eager mode#2197

Merged
regisss merged 3 commits into
huggingface:mainfrom
12010486:fix_eager
Sep 11, 2025
Merged

Fix for eos in eager mode#2197
regisss merged 3 commits into
huggingface:mainfrom
12010486:fix_eager

Conversation

@12010486
Copy link
Copy Markdown
Contributor

@12010486 12010486 commented Aug 8, 2025

A fast repro is as below:

PT_HPU_LAZY_MODE=0 python run_generation.py --model_name_or_path meta-llama/Llama-3.2-1B-Instruct --use_kv_cache --max_new_tokens 100 --do_sample --batch_size 2 --prompt "Hello world" "How are you?" --sdp_on_bf16 --no-ignore_eos

Which would have failed without this PR.

We get:

Input/outputs:
input 1: ('Hello world',)
output 1.1: ("Hello world! This is my first time using this platform. I am excited to share my knowledge and experiences with the world. I am a student of computer science and engineering, and I am currently working on my master's degree.\n\nMy area of interest is in artificial intelligence, machine learning, and deep learning. I have been studying these topics for several years and have gained a good understanding of the concepts and techniques involved. I have also been experimenting with various deep learning frameworks such as TensorFlow and PyTorch.\n\n",)

input 2: ('How are you?',)
output 2.1: ("How are you? Today is a beautiful day!\nI'm doing great, thanks for asking! It's a lovely day to be outside, isn't it? I love days like this, when the sun is shining and the birds are singing. There's something so uplifting about being in nature, don't you think?\n\nI've been meaning to get out and enjoy the great outdoors, but I've been stuck inside a lot lately. How about you? What are you up to today? Do you have any fun plans",)


Stats:
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Input tokens
Throughput (including tokenization) = 108.17886215980566 tokens/second
Average first token latency         = 22.46260239626281 ms
Average rest token latency          = 18.413477945447234 ms
Average end to end latency          = 1848.2638252025936 ms
Memory allocated                    = 5.62 GB
Max memory allocated                = 5.76 GB
Total memory available              = 94.62 GB
Graph compilation duration          = 5.586424403998535 seconds
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

--no-ignore_eos is just slightly slower than --ignore_eos in eager, 113.073 vs 108.179 tokens/second. This is expected

@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Comment thread optimum/habana/transformers/generation/utils.py Outdated
@12010486 12010486 requested a review from yafshar August 8, 2025 14:08
Copy link
Copy Markdown
Contributor

@yafshar yafshar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Comment thread optimum/habana/transformers/generation/utils.py Outdated
@12010486 12010486 requested a review from regisss August 28, 2025 17:06
@karol-brejna-i karol-brejna-i self-assigned this Sep 4, 2025
Copy link
Copy Markdown
Collaborator

@regisss regisss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@regisss regisss merged commit b4f1710 into huggingface:main Sep 11, 2025
2 of 5 checks passed
astachowiczhabana pushed a commit that referenced this pull request Sep 12, 2025
astachowiczhabana pushed a commit that referenced this pull request Sep 17, 2025
@regisss regisss mentioned this pull request Sep 29, 2025
3 tasks
gplutop7 pushed a commit to HabanaAI/optimum-habana-fork that referenced this pull request Oct 15, 2025
Co-authored-by: Silvia Colabrese <silvia.colabrese@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants