Fix eos not stopping issue when batch_size >1 and set ignore_eos to False by heyuanliu-intel · Pull Request #1287 · huggingface/optimum-habana

heyuanliu-intel · 2024-08-23T03:33:50Z

What does this PR do?

This PR will fix the eos token not stopping issue when the batch size is greater than 1 and set ignore_eos to False.

How to reproduce it ?

setup the env using https://github.com/heyuanliu-intel/optimum-habana/tree/main/examples/text-generation
Run below command:
Use --no-ignore_eos in the running command and set batch size >1.

python run_generation.py \
--model_name_or_path meta-llama/Llama-2-7b-chat-hf \
--max_new_tokens 1024 \
--bf16 \
--use_hpu_graphs \
--use_kv_cache \
--batch_size 2 \
--attn_softmax_bf16 \
--limit_hpu_graphs \
--reuse_cache \
--trim_logits \
--no-ignore_eos \
--prompt "Please introduce yourself in 10 words." "How are you?"

The response will never stop even generating the eos token.

imangohari1 · 2024-08-23T14:44:25Z

@heyuanliu-intel
Thanks.
We likely need to run CI jobs on this to make sure there is no side effects on other models for this.
Do you have access to CI systems to do so?

heyuanliu-intel · 2024-08-24T09:07:08Z

Do you have access to CI systems to do so?

I don't have the right to access the CI systems.

libinta · 2024-09-24T21:43:34Z

@heyuanliu-intel please run with your gaudi machine by adding
GAUDI2_CI=1 RUN_SLOW=1 python -m tests/test_text_generation_example.py xxx with your specific case

libinta · 2024-10-01T17:28:04Z

@heyuanliu-intel if you have access to gaudi2, please run on your local with below cmd:
GAUDI2_CI=true RUN_SLOW=1 python -m tests/test_text_generations .....

mounikamandava · 2024-11-04T06:28:50Z

LGTM.

vidyasiv · 2024-11-12T22:41:04Z

@heyuanliu-intel , can you run on 8 hpu baremetal Gaudi2 and paste test results at least for:

setup:

export GAUDI2_CI=1
export RUN_SLOW=1
pip install .[tests]

tests:

(fast tests) python -m pytest tests/test_gaudi_configuration.py tests/test_trainer_distributed.py tests/test_trainer.py tests/test_trainer_seq2seq.py
(text-gen) python -m tests/test_text_generation_example.py

we want to make sure this doesnt introduce failures

vidyasiv

see request for testing

vidyasiv · 2024-11-14T19:41:00Z

Results of CI(370) run:

fast_tests: 81 passed
single card tests: tests/test_examples.py::CausalLanguageModelingExampleTester::test_run_clm_gpt2_single_card - AssertionError: 46.1763 not less than or equal to 46.13290500000001
text-gen: FAILED tests/test_text_generation_example.py::test_text_generation_fp8[token0-mistralai/Mixtral-8x7B-v0.1-2-48-True-2048-2048-1147.5] - assert 1074.9889343503237 >= ((2 - 1.05) * 1147.5)
multi card tests: 11 failed, 34 passed- failures also seen on main (build 368)

@heyuanliu-intel can you verify that failures 2 and 4 are independent of your changes?

Setup:

Book 8 hpu/baremetal host
export GAUDI2_CI=1
export RUN_SLOW=1
pip install .[tests]
pip install git+https://github.com/HabanaAI/DeepSpeed.git@1.18.0
huggingface-cli login --token <(your hf token)>

Run on main:

pytest -v -s tests/test_examples.py::CausalLanguageModelingExampleTester::test_run_clm_gpt2_single_card --token <>
pytest -v -s tests/test_text_generation_example.py::test_text_generation_fp8[token0-mistralai/Mixtral-8x7B-v0.1-2-48-True-2048-2048-1147.5]

Then run same commands w/ your changes and paste test results on PR.

heyuanliu-intel · 2024-11-15T00:58:10Z

I will try it.

heyuanliu-intel · 2024-11-15T08:10:33Z

For the second case (pytest -v -s tests/test_text_generation_example.py::test_text_generation_fp8[token0-mistralai/Mixtral-8x7B-v0.1-2-48-True-2048-2048-1147.5]), doesn't apply my PR. It stills fails.

    with TemporaryDirectory() as tmp_dir:
            command.append(f"--output_dir {tmp_dir}")
            command.append(f"--token {token.value}")

            pattern = re.compile(r"([\"\'].+?[\"\'])|\s")

            if fp8:
                env_variables["TQDM_DISABLE"] = "1"
                if measure_command is not None:
                    measure_command.append(f"--token {token.value}")
                    env_variables["QUANT_CONFIG"] = os.path.join(
                        path_to_example_dir, "text-generation/quantization_config/maxabs_measure_include_outputs.json"
                    )
                    measure_command = [x for y in measure_command for x in re.split(pattern, y) if x]
                    print(f"\n\nMeasure Command to test: {' '.join(measure_command[:-2])}\n")
                    proc = subprocess.run(measure_command, env=env_variables)

                    # Ensure the run finished without any issue
                    # Use try-except to avoid logging the token if used
                    try:
>                       assert proc.returncode == 0
E                       AssertionError: The following command failed:
E                       python3 /root/optimum-habana/examples/gaudi_spawn.py --use_deepspeed --world_size 2 /root/optimum-habana/examples/text-generation/run_generation.py --model_name_or_path mistralai/Mixtral-8x7B-v0.1 --batch_size 1 --use_kv_cache --reuse_cache --bucket_size 128 --bucket_internal --use_hpu_graphs --trim_logits

tests/test_text_generation_example.py:280: AssertionError
========================================================================================================== short test summary info ==========================================================================================================
FAILED tests/test_text_generation_example.py::test_text_generation_fp8[token0-mistralai/Mixtral-8x7B-v0.1-2-48-True-2048-2048-1147.5] - AssertionError: The following command failed:
======================================================================================================= 1 failed in 893.20s (0:14:53) =======================================================================================================
root@sysid674639:~/optimum-habana#

heyuanliu-intel · 2024-11-15T08:15:56Z

For the first case: pytest -v -s tests/test_examples.py::CausalLanguageModelingExampleTester::test_run_clm_gpt2_single_card --token <>. This case is passed with/without my PR.

***** Running Evaluation *****
[INFO|trainer.py:1852] 2024-11-15 08:13:58,717 >>   Num examples = 240
[INFO|trainer.py:1855] 2024-11-15 08:13:58,717 >>   Batch size = 4
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 60/60 [00:01<00:00, 33.12it/s]
***** eval metrics *****
  epoch                       =        2.0
  eval_accuracy               =     0.4194
  eval_loss                   =     3.0499
  eval_runtime                = 0:00:02.95
  eval_samples                =        240
  eval_samples_per_second     =    127.818
  eval_steps_per_second       =     31.954
  max_memory_allocated (GB)   =      92.88
  memory_allocated (GB)       =      28.74
  perplexity                  =    21.1136
  total_memory_available (GB) =      93.55
PASSED

======================================================================================================= 1 passed in 71.32s (0:01:11) ========================================================================================================

heyuanliu-intel · 2024-11-15T08:16:55Z

So my summary:

For first case, it is passed with my PR.
For second case, it still fail without my PR.

vidyasiv · 2024-11-15T18:10:53Z

So my summary:

1. For first case, it is passed with my PR.

2. For second case, it still fail without my PR.

1st case could be from host to host variation.Thanks for testing.

You should be (w/ -v , -s) able to see what the failure is for 2nd case if you scroll.

Did you pass --token <> for second case too? I realized i missed it above. If still fails check for below:

Maybe you can check if u have access to model : https://huggingface.co/mistralai/Mixtral-8x7B-v0.1.
You need to be on 8 hpu.
Could be you have to run measure and then quant for fp8 and specific test command jumps to quant and cant find generated folder. Try running something like: pytest -v -s tests/test_text_generation_example.py -k mixtral --token (your token) (will run run measure). Followed by test, whose command is

python3 /root/optimum-habana/examples/gaudi_spawn.py --use_deepspeed --world_size 2 /root/optimum-habana/examples/text-generation/run_generation.py --model_name_or_path mistralai/Mixtral-8x7B-v0.1 --batch_size 48 --use_kv_cache --max_new_tokens 2048 --reuse_cache --bucket_size 128 --bucket_internal --use_hpu_graphs --trim_logits --max_input_tokens 2048 --limit_hpu_graphs

I tested on your behalf but on your side if you still face issues we can get on a call. Everyone should be able to run tests.

vidyasiv · 2024-11-15T23:15:24Z

For 2nd test

pytest -v -s tests/test_text_generation_example.py::test_text_generation_fp8[token0-mistralai/Mixtral-8x7B-v0.1-2-48-True-2048-2048-1147.5] --token ()
Got a particularly slow host
On this PR 1287:

Throughput (including tokenization) = 975.8450231790654 tokens/second
Number of HPU graphs                = 71
Memory allocated                    = 67.17 GB
Max memory allocated                = 94.2 GB
Total memory available              = 94.62 GB
Graph compilation duration          = 317.2299808960006 seconds

On main:

Input tokens
Throughput (including tokenization) = 975.924843479448 tokens/second
Memory allocated                    = 67.16 GB
Max memory allocated                = 93.79 GB
Total memory available              = 94.62 GB
Graph compilation duration          = 316.1472292409999 seconds

Nearly same so can also consider this a non issue.

Testing PR w/ given command:

python  examples/text-generation/run_generation.py \
--model_name_or_path meta-llama/Llama-2-7b-chat-hf \
--max_new_tokens 1024 \
--bf16 \
--use_hpu_graphs \
--use_kv_cache \
--batch_size 2 \
--attn_softmax_bf16 \
--limit_hpu_graphs \
--reuse_cache \
--trim_logits \
--no-ignore_eos \
--prompt "Please introduce yourself in 10 words." "How are you?"

Output

Input/outputs:
input 1: ('Please introduce yourself in 10 words.',)
output 1: ('Please introduce yourself in 10 words.\nI am a friendly and curious person.',)

input 2: ('How are you?',)
output 1: ('How are you? I hope you are doing well.\nI am writing to you today to ask for your help. As you may know, I am a big fan of your work and I have been following your career for many years. I must say, you are one of the most talented and dedicated actors of our time.\n\nI am reaching out to you because I am in need of your help. I am producing a new film and I am looking for an actor to play the lead role. I believe that you would be perfect for the part and I would be honored if you would consider it.\n\nThe film is a drama about a man who is struggling to come to terms with a personal tragedy. It is a complex and challenging role, but I believe that you have the depth and range to bring it to life.\n\nI understand that you must be very busy, but I hope that you will take the time to consider this opportunity. I would be happy to discuss the project further with you and answer any questions you may have.\n\nThank you for your time and consideration. I look forward to hearing from you soon.\n\nSincerely,\n[Your Name]',)


Stats:
----------------------------------------------------------------------------------------------------------------
Throughput (including tokenization) = 848.3231944449811 tokens/second
Number of HPU graphs                = 2329
Memory allocated                    = 13.63 GB
Max memory allocated                = 13.63 GB
Total memory available              = 94.62 GB
Graph compilation duration          = 36.30529521800054 seconds

Testing w/o ignore EOS option

python  examples/text-generation/run_generation.py \
--model_name_or_path meta-llama/Llama-2-7b-chat-hf \
--max_new_tokens 1024 \
--bf16 \
--use_hpu_graphs \
--use_kv_cache \
--batch_size 2 \
--attn_softmax_bf16 \
--limit_hpu_graphs \
--reuse_cache \
--trim_logits \
--prompt "Please introduce yourself in 10 words." "How are you?"

Output @heyuanliu-intel , let me know if this is expected result without --no-ignore_eos

Input/outputs:
input 1: ('Please introduce yourself in 10 words.',)
output 1: ('Please introduce yourself in 10 words.\nI am a friendly and curious person.01. What is your name?\nMy name is Sherlock Holmes.\n02. What is your occupation?\nI am a consulting detective.\n03. What is your favorite hobby?\nSolving mysteries and uncovering the truth.\n04. What is your favorite food?\nIrish stew and scones.\n05. What is your favorite drink?\nA good strong cup of tea.\n06. What is your favorite place to visit?\nThe British Museum.\n07. What is your favorite book?\n"The Adventures of Sherlock Holmes" by Arthur Conan Doyle.\n08. What is your favorite music?\nClassical music, particularly Chopin.\n09. What is your favorite sport?\nI do not have a favorite sport, as I find physical activity to be a waste of time.\n10. What is your favorite thing to do on a rainy day?\nSolving a puzzling case or reading a good book. The 10 Best Sherlock Holmes Quotes\nSherlock Holmes is one of the most iconic fictional characters in history, known for his incredible powers of observation, his keen mind, and his ability to solve even the most complex of mysteries. Here are 10 of the best Sherlock Holmes quotes that showcase his wit, intelligence, and unique perspective on the world:\n1. "Data! Data! Data! I can\'t make bricks without clay." - This quote, often misattributed to Sherlock Holmes, is a reminder that facts and evidence are the foundation of any good investigation.\n2. "It is a capital mistake to theorize before one has data. Insensibly one begins to twist facts to suit theories, instead of theories to suit facts." - This quote highlights the importance of approaching a case with an open mind and not jumping to conclusions based on incomplete information.\n3. "The game is afoot!" - This quote is often used to signal that an investigation is underway, and it has become a catchphrase for fans of the character.\n4. "Elementary, my dear Watson!" - This quote is often used to indicate that something is obvious or simple, but it also has a deeper meaning as it highlights the contrast between Holmes\' logical, analytical mind and Watson\'s more emotional and intuitive approach to solving cases.\n5. "The world is full of obvious things which nobody by any chance ever observes." - This quote highlights the idea that the most obvious clues are often overlooked, and that it takes a unique and observant mind like Holmes\' to notice them.\n6. "I have no knowledge of the future. I can only see the present." - This quote is often used to remind readers that Holmes is not a fortune teller or a predictor of the future, but rather a detective who uses his powers of observation and deduction to solve crimes in the present.\n7. "The art of detection is not a subtle, delicate thing, but a brutal and often unsavory process." - This quote highlights the idea that solving crimes is not always a glamorous or pleasant task, but rather a difficult and unpleasant one that requires a strong stomach and a willingness to confront unpleasantness.\n8. "I am the only being who can see the truth, and I am the only one who can uncover it." - This quote highlights the idea that Holmes is a unique and special individual, with a gift for seeing the truth that others may miss.\n9. "The game is afoot! And I am the only player." - This quote is often used to indicate that Holmes is ready to take on a new case, and that he is the only one who can solve it.\n10. "The world is a stage, and I am the only actor." - This quote highlights the idea that Holmes sees himself as a performer, with a unique role to play in the world of crime and detection. It also highlights his sense of self-importance and his belief that he is the only one who truly understands the game of detection.\nOverall, these quotes showcase the unique personality and perspective of Sherlock Holmes, and they highlight the key elements of his character that have made him such an enduring and beloved figure in popular culture. The 10 Best Books on the Art of Storytelling\nStorytelling is an essential part of human communication, and the art of storytelling has been passed down through generations. Whether you\'re a writer, a marketer, or simply',)

input 2: ('How are you?',)
output 1: ('How are you? I hope you are doing well.\nI am writing to you today to ask for your help. As you may know, I am a big fan of your work and I have been following your career for many years. I must say, you are one of the most talented and dedicated actors of our time.\n\nI am reaching out to you because I am in need of your help. I am producing a new film and I am looking for an actor to play the lead role. I believe that you would be perfect for the part and I would be honored if you would consider it.\n\nThe film is a drama about a man who is struggling to come to terms with a personal tragedy. It is a complex and challenging role, but I believe that you have the depth and range to bring it to life.\n\nI understand that you must be very busy, but I hope that you will take the time to consider this opportunity. I would be happy to discuss the project further with you and answer any questions you may have.\n\nThank you for your time and consideration. I look forward to hearing from you soon.\n\nSincerely,\n[Your Name] \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n',)


Stats:
---------------------------------------------------------------------------------------------------------------
Throughput (including tokenization) = 274.32692529299936 tokens/second
Number of HPU graphs                = 17
Memory allocated                    = 13.63 GB
Max memory allocated                = 13.63 GB
Total memory available              = 94.62 GB
Graph compilation duration          = 23.226845264000076 seconds
---------------------------------------------------------------------------------------------------------------

vidyasiv · 2024-11-18T21:32:12Z

@heyuanliu-intel pls clarify above output w/o --no-ignore_eos is expected or not?

heyuanliu-intel · 2024-11-21T05:25:38Z

@vidyasiv Yes, the output is expected. If you run without --no-ignore_eos, the output length will depends on the value of --max_new_tokens. If you run with --no-ignore_eos, it should stop when meet eos token.

vidyasiv

@regisss please take a look

regisss · 2024-11-25T15:01:11Z

@heyuanliu-intel I cannot reproduce the issue on main, can you check and let me know if that works on your side too?

heyuanliu-intel · 2024-11-27T08:18:04Z

@regisss I have verified this issue on the main branch and I can't reproduce it on main branch now. Maybe this issue has been fixed by other way.

regisss · 2024-11-27T17:44:30Z

@regisss I have verified this issue on the main branch and I can't reproduce it on main branch now. Maybe this issue has been fixed by other way.

Okay, let's keep this PR open till next release in case the issue appears again

heyuanliu-intel requested review from bhargaveede, ssarkar2 and vivekgoe as code owners August 23, 2024 03:33

heyuanliu-intel changed the title ~~Fix eso not stopping issue when batch_size >1~~ Fix eos not stopping issue when batch_size >1 and set ignore_eos to False Aug 23, 2024

libinta added the review wip label Aug 31, 2024

sywangyi mentioned this pull request Sep 18, 2024

Idefics2 #1270

Merged

3 tasks

libinta added synapse 1.18 dependency synapse1.18 and removed synapse 1.18 dependency labels Sep 24, 2024

libinta removed the synapse1.18 label Oct 4, 2024

mounikamandava approved these changes Nov 4, 2024

View reviewed changes

vidyasiv suggested changes Nov 12, 2024

View reviewed changes

vidyasiv approved these changes Nov 21, 2024

View reviewed changes

libinta added run-test Run CI for PRs from external contributors and removed review wip labels Nov 21, 2024

regisss removed the run-test Run CI for PRs from external contributors label Nov 27, 2024

libinta added the synapse1.20 label Dec 2, 2024

heyuanliu-intel closed this Dec 7, 2024

Conversation

heyuanliu-intel commented Aug 23, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

How to reproduce it ?

Uh oh!

imangohari1 commented Aug 23, 2024

Uh oh!

heyuanliu-intel commented Aug 24, 2024

Uh oh!

libinta commented Sep 24, 2024

Uh oh!

libinta commented Oct 1, 2024

Uh oh!

mounikamandava commented Nov 4, 2024

Uh oh!

vidyasiv commented Nov 12, 2024

setup:

tests:

Uh oh!

vidyasiv left a comment

Choose a reason for hiding this comment

Uh oh!

vidyasiv commented Nov 14, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

heyuanliu-intel commented Nov 15, 2024

Uh oh!

heyuanliu-intel commented Nov 15, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

heyuanliu-intel commented Nov 15, 2024

Uh oh!

heyuanliu-intel commented Nov 15, 2024

Uh oh!

vidyasiv commented Nov 15, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vidyasiv commented Nov 15, 2024

For 2nd test

Testing PR w/ given command:

Testing w/o ignore EOS option

Uh oh!

vidyasiv commented Nov 18, 2024

Uh oh!

heyuanliu-intel commented Nov 21, 2024

Uh oh!

vidyasiv left a comment

Choose a reason for hiding this comment

Uh oh!

regisss commented Nov 25, 2024

Uh oh!

heyuanliu-intel commented Nov 27, 2024

Uh oh!

regisss commented Nov 27, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

heyuanliu-intel commented Aug 23, 2024 •

edited

Loading

vidyasiv commented Nov 14, 2024 •

edited

Loading

heyuanliu-intel commented Nov 15, 2024 •

edited

Loading

vidyasiv commented Nov 15, 2024 •

edited

Loading