A generation issue: when ignore_eos=False and the model's pad_token==eos_t…#1539
A generation issue: when ignore_eos=False and the model's pad_token==eos_t…#1539YunLiu1 wants to merge 1 commit into
Conversation
…oken (like Llama3), the generated results in same batch size erased.
|
@YunLiu1 Can you provide an example of command that enables to reproduce this issue please? |
Sure, because in run_generation.py the ignore_eos is always True, you need to change the code first, python3 ~/optimum-habana/examples/text-generation/run_generation.py There is no output for the short prompt "Hello world," |
|
@YunLiu1 When I run this command, I get: which looks fine. Can you try again on the latest main branch and let me know if you still see it please? Besides, you can set |
A generation issue: when ignore_eos=False and the model's pad_token==eos_token (like Llama3), the generated results in same batch size erased.
What does this PR do?
When generating text with Optimum-habana, if BS>1, ignore_eos=False, the model's pad_token==eos_token (like Llama3.1-8B),

This is an example:
I submit 2 prompts ("Hello world,", "How are you?"), the BS=2, the short one is padded at the left:
After generation, the first pad_token is recognized as eos_token, and the response is erased.
Fixes
I modified the post-process to ignore the left pad_tokens, and only erase the tokens after real eos_token.
Unit Tests
The changed codes passed this Unit Tests.
eos_test_py.txt
Function Tests
And it passed the Function Tests:
lm_eval mmlu_pro_business for Meta-Llama-3.1-8B-Instruct (pad_token=eos_toekn, bs=8):
lm_eval mmlu_pro_business for llama2-7b (pad_token=0, bs=8):