Fix eos not stopping issue when batch_size >1 and set ignore_eos to False#1287
Fix eos not stopping issue when batch_size >1 and set ignore_eos to False#1287heyuanliu-intel wants to merge 0 commit into
Conversation
|
@heyuanliu-intel |
I don't have the right to access the CI systems. |
|
@heyuanliu-intel please run with your gaudi machine by adding |
|
@heyuanliu-intel if you have access to gaudi2, please run on your local with below cmd: |
|
LGTM. |
|
@heyuanliu-intel , can you run on 8 hpu baremetal Gaudi2 and paste test results at least for: setup:
tests:
we want to make sure this doesnt introduce failures |
vidyasiv
left a comment
There was a problem hiding this comment.
see request for testing
|
Results of CI(370) run:
@heyuanliu-intel can you verify that failures 2 and 4 are independent of your changes? Setup:
Run on main:
Then run same commands w/ your changes and paste test results on PR. |
|
I will try it. |
|
For the second case (pytest -v -s tests/test_text_generation_example.py::test_text_generation_fp8[token0-mistralai/Mixtral-8x7B-v0.1-2-48-True-2048-2048-1147.5]), doesn't apply my PR. It stills fails. |
|
For the first case: pytest -v -s tests/test_examples.py::CausalLanguageModelingExampleTester::test_run_clm_gpt2_single_card --token <>. This case is passed with/without my PR. |
|
So my summary:
|
1st case could be from host to host variation.Thanks for testing. You should be (w/ -v , -s) able to see what the failure is for 2nd case if you scroll. Did you pass --token <> for second case too? I realized i missed it above. If still fails check for below:
I tested on your behalf but on your side if you still face issues we can get on a call. Everyone should be able to run tests. |
For 2nd test
On main: Nearly same so can also consider this a non issue. Testing PR w/ given command:Output Testing w/o ignore EOS optionOutput @heyuanliu-intel , let me know if this is expected result without |
|
@heyuanliu-intel pls clarify above output w/o --no-ignore_eos is expected or not? |
|
@vidyasiv Yes, the output is expected. If you run without --no-ignore_eos, the output length will depends on the value of --max_new_tokens. If you run with --no-ignore_eos, it should stop when meet eos token. |
|
@heyuanliu-intel I cannot reproduce the issue on main, can you check and let me know if that works on your side too? |
|
@regisss I have verified this issue on the main branch and I can't reproduce it on main branch now. Maybe this issue has been fixed by other way. |
Okay, let's keep this PR open till next release in case the issue appears again |
What does this PR do?
This PR will fix the eos token not stopping issue when the batch size is greater than 1 and set ignore_eos to False.
How to reproduce it ?
Use --no-ignore_eos in the running command and set batch size >1.
The response will never stop even generating the eos token.