Skip to content

Fixed '--no-ignore_eos' not working correctly#1408

Closed
Jing1Ling wants to merge 1 commit into
huggingface:mainfrom
Jing1Ling:update_eos_stop_criteria
Closed

Fixed '--no-ignore_eos' not working correctly#1408
Jing1Ling wants to merge 1 commit into
huggingface:mainfrom
Jing1Ling:update_eos_stop_criteria

Conversation

@Jing1Ling
Copy link
Copy Markdown
Contributor

What does this PR do?

Fixes # (issue)
Command used for testing:

python3 run_generation.py  -model_name_or_path /workspace/ckpt/huggingface/hub/models--mistralai--Mixtral-8x7B-Instruct-v0.1/snapshots/41bd4c9e7e4fb318ca40e721131d4933966c2cc1 --use_kv_cache --reuse_cache --trim_logits  --attn_softmax_bf16 --max_input_tokens 60 --max_new_tokens 500 --bf16 --batch_size 2 --warmup 0 --n_iterations 1  --no-ignore_eos --prompt "Hello world" "How are you?"

Added the following code after the L480 of 'run_generation.py' file to observe the output of the model.

import numpy as np
for idx, output in enumerate(outputs[:2], start=1):
    arr = np.array(output)
    with open(f"samp;e-{idx}.txt", "w") as file:
        file.writelines(f" {it} \n" for it in arr)

Expected behavior: with '--no-ignore_eos', if a sample generates an eos token, the subsequently generated tokens will be overwritten by the pad token until all samples in the current batch have generated eos or meet other termination conditions.

Actual behavior: Taking batch_size=2 as an example, sample0 and sample1 have generated eos tokens respectively (note that not at the same time), the model will not stop generating, and subsequent tokens will not be overwritten by pad tokens. Instead, it will not stop until all samples generate eos tokens at the same time or trigger other stop criteria. This will cause some unreasonable sentences to appear at the end of the output.

Code Details
As shown here, unfinished_sequences maintains information about whether each sample has finished generating. If the sample with idx==x has not finished generating, then unfinished_sequences[x] will be True. If it has finished, the subsequently generated token will be overwritten by the pad token. This is fine.

However, since eos-related stop criteria only return a single bool value[see here] , the unfinished_sequence cannot be updated correctly[update code].

I changed the return value to tensor and the final output was as expected.

Question
There are comments in code(comment1, comment2) indicate that a boolean value is returned in the case of static shape. But why is it necessary to return a single value under static shape?

In addition, can we change the default value of ignore_eos to false? Set it to true only when we need to test performance? In actual use, it is more intuitive to stop after generating eos, and it can also avoid unnecessary calculations.

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?

Copy link
Copy Markdown
Contributor

@ssarkar2 ssarkar2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The eos not being honored issue is fixed here: #1270

Though this PR fixes it, it will have a perf hit for text gen (because we re returning a tensor instead of a bool in all cases)

@Jing1Ling
Copy link
Copy Markdown
Contributor Author

The eos not being honored issue is fixed here: #1270

Though this PR fixes it, it will have a perf hit for text gen (because we re returning a tensor instead of a bool in all cases)

I will close this PR because the issue has been fixed by the PR you mentioned.

@Jing1Ling Jing1Ling closed this Oct 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants