Skip to content

Fix GPT_neox incorrect output with batch query#1358

Merged
regisss merged 1 commit into
huggingface:mainfrom
Jianhong-Zhang:gpt_neox_output
Oct 16, 2024
Merged

Fix GPT_neox incorrect output with batch query#1358
regisss merged 1 commit into
huggingface:mainfrom
Jianhong-Zhang:gpt_neox_output

Conversation

@Jianhong-Zhang
Copy link
Copy Markdown
Contributor

No description provided.

@vivekgoe
Copy link
Copy Markdown
Collaborator

@dvarshney-habana @libinta can you please help with review? change is in text-generation, you may have a better idea about effect of changes here.

if next_tokens[i] == eos_token_id:
idx_bs[i] = token_idx.item()
if token_idx > idx_bs[i]:
next_tokens[i] = pad_token_id
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please do not add any logic inside the generation loop, it will impact performance. instead you can do it after generation to return right content based on eos

Copy link
Copy Markdown
Contributor Author

@Jianhong-Zhang Jianhong-Zhang Oct 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved logic to after generation loop and before return

idx_bs = idx
if idx > idx_bs:
input_ids[i][idx] = pad_token_id
idx_bs = generation_config.max_length
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we still want to return padding token in this case or just stop here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

according to customer requirement, we need to padding token after <|endoftext|>

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://huggingface.co/facebook/bart-large-cnn/blob/main/config.json
For bart decoder_start_token_id and eos_token_id are same.
This breaks bart generation

@libinta libinta added the run-test Run CI for PRs from external contributors label Oct 16, 2024
@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@regisss regisss merged commit ddd4e8d into huggingface:main Oct 16, 2024
ugolowic added a commit to HabanaAI/optimum-habana-fork that referenced this pull request Dec 3, 2024
PR huggingface#1358 from upstream introduced a nested for loop which
caused performance drop observed in T5 summarization.
This commit translates the loop into tensor operations
and restores performance.

Co-authored-by: Adam Stachowicz <105052242+astachowiczhabana@users.noreply.github.com>
Signed-off-by: Urszula Golowicz <urszula.golowicz@intel.com>
ugolowic added a commit to HabanaAI/optimum-habana-fork that referenced this pull request Dec 3, 2024
PR huggingface#1358 from upstream introduced a nested for loop which
caused performance drop observed in T5 summarization.
This commit translates the loop into tensor operations
and restores performance.

Co-authored-by: Adam Stachowicz <105052242+astachowiczhabana@users.noreply.github.com>
Signed-off-by: Urszula Golowicz <urszula.golowicz@intel.com>
ugolowic added a commit to HabanaAI/optimum-habana-fork that referenced this pull request Dec 3, 2024
PR huggingface#1358 from upstream introduced a nested for loop which
caused performance drop observed in T5 summarization.
This commit translates the loop into tensor operations
and restores performance.

Co-authored-by: Adam Stachowicz <105052242+astachowiczhabana@users.noreply.github.com>
Signed-off-by: Urszula Golowicz <urszula.golowicz@intel.com>
ugolowic added a commit to HabanaAI/optimum-habana-fork that referenced this pull request Dec 3, 2024
PR huggingface#1358 from upstream introduced a nested for loop which
caused performance drop observed in T5 summarization.
This commit translates the loop into tensor operations
and restores performance.

Co-authored-by: Adam Stachowicz <105052242+astachowiczhabana@users.noreply.github.com>
Signed-off-by: Urszula Golowicz <urszula.golowicz@intel.com>
ugolowic added a commit to HabanaAI/optimum-habana-fork that referenced this pull request Dec 3, 2024
PR huggingface#1358 introduced a nested for loop which caused
performance drop observed e.g. in T5 summarization.
This commit translates the loop into tensor operations
and restores performance.

Co-authored-by: Adam Stachowicz <105052242+astachowiczhabana@users.noreply.github.com>
Signed-off-by: Urszula Golowicz <urszula.golowicz@intel.com>
xinyu-intel pushed a commit to HabanaAI/optimum-habana-fork that referenced this pull request Mar 4, 2025
PR huggingface#1358 from upstream introduced a nested for loop which
caused performance drop observed in T5 summarization.
This commit translates the loop into tensor operations
and restores performance.

Co-authored-by: Adam Stachowicz <105052242+astachowiczhabana@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

run-test Run CI for PRs from external contributors

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants