Skip to content

Upgrade to Transformers v4.45#1359

Merged
regisss merged 27 commits into
mainfrom
transformers_future
Oct 2, 2024
Merged

Upgrade to Transformers v4.45#1359
regisss merged 27 commits into
mainfrom
transformers_future

Conversation

@regisss
Copy link
Copy Markdown
Collaborator

@regisss regisss commented Sep 25, 2024

What does this PR do?

As per title.

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?

@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@yafshar
Copy link
Copy Markdown
Contributor

yafshar commented Sep 26, 2024

@regisss
Copy link
Copy Markdown
Collaborator Author

regisss commented Sep 26, 2024

@regisss should we loosen the requirements for https://github.com/huggingface/optimum-habana/blob/8043d2cef69edc9eae6c7282bbb7fa41f268e5b6/examples/language-modeling/requirements.txt#L7C1-L7C15 to peft>=0.11.1,<=0.13.0

We can do it. Is the current constraint too strict for some examples?

@harborn
Copy link
Copy Markdown

harborn commented Sep 27, 2024

when will this PR merge?
currently, I need this PR to use transformers==4.45.0

@regisss
Copy link
Copy Markdown
Collaborator Author

regisss commented Sep 27, 2024

when will this PR merge? currently, I need this PR to use transformers==4.45.0

When all our internal tests are validated. Probably next week but guarantee it.
In the meantime, feel free to install this branch directly.

@harborn
Copy link
Copy Markdown

harborn commented Sep 27, 2024

when will this PR merge? currently, I need this PR to use transformers==4.45.0

When all our internal tests are validated. Probably next week but guarantee it. In the meantime, feel free to install this branch directly.

I have pulled this patch and run my workflow based on this patch to inference llama-3.2-11b and llama-3.2-90b, it can work, but with lower performance.

@regisss
Copy link
Copy Markdown
Collaborator Author

regisss commented Sep 27, 2024

when will this PR merge? currently, I need this PR to use transformers==4.45.0

When all our internal tests are validated. Probably next week but guarantee it. In the meantime, feel free to install this branch directly.

I have pulled this patch and run my workflow based on this patch to inference llama-3.2-11b and llama-3.2-90b, it can work, but with lower performance.

Lower performance means lower throughput than Llama 3.1?

@yafshar
Copy link
Copy Markdown
Contributor

yafshar commented Sep 27, 2024

@regisss should we loosen the requirements for https://github.com/huggingface/optimum-habana/blob/8043d2cef69edc9eae6c7282bbb7fa41f268e5b6/examples/language-modeling/requirements.txt#L7C1-L7C15 to peft>=0.11.1,<=0.13.0

We can do it. Is the current constraint too strict for some examples?

Some of the dependencies like neural-compressor will cause a different version installation and when you are in the language modeling example with the hard constraint for peft, which I do not see the reason, it will cause re-installation and some pip dependencies error + warnings . @sywangyi is there a constraint that you fixed the version to 0.12.0 or can it be loose as above?

loss = None
if labels is not None:
# Upcast to float if we need to compute the loss to avoid potential precision issues
logits = logits.float()
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why this is need to be logit float?

# This `clone` call is needed to avoid recapturing cuda graphs with `torch.compile`'s `mode="reduce-overhead`, as otherwise the
# input `position_ids` would have various stride during the decoding. Here, simply using `.contiguous()` is not sufficient as in
# the batch size = 1 case, `position_ids` is already contiguous but with varying stride which retriggers a capture.
model_inputs = {"input_ids": input_ids.clone(memory_format=torch.contiguous_format), "inputs_embeds": None}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add if lazy_mode check

position_ids = position_ids[:, -1]

# This `clone` call is needed to avoid recapturing cuda graphs with `torch.compile`'s `mode="reduce-overhead`, as otherwise the input `position_ids` would have various stride during the decoding. Here, simply using `.contiguous()` is not sufficient as in the batch size = 1 case, `position_ids` is already contiguous but with varying stride which retriggers a capture.
position_ids = position_ids.clone(memory_format=torch.contiguous_format)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add lazy_mode check

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think clone causes perf issue, and we dont need this if it's not torch,compile,right?

model_inputs = {"inputs_embeds": inputs_embeds}
else:
model_inputs = {"input_ids": input_ids.contiguous()}
model_inputs = {"input_ids": input_ids.clone(memory_format=torch.contiguous_format)}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

loss = None
if labels is not None:
# Upcast to float if we need to compute the loss to avoid potential precision issues
logits = logits.float()
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dup of line585?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems the .float() of line 585 will be removed in Transformers v4.46. Do you see any perf degradation?

@sywangyi
Copy link
Copy Markdown
Collaborator

sywangyi commented Sep 28, 2024

@regisss should we loosen the requirements for https://github.com/huggingface/optimum-habana/blob/8043d2cef69edc9eae6c7282bbb7fa41f268e5b6/examples/language-modeling/requirements.txt#L7C1-L7C15 to peft>=0.11.1,<=0.13.0

We can do it. Is the current constraint too strict for some examples?

Some of the dependencies like neural-compressor will cause a different version installation and when you are in the language modeling example with the hard constraint for peft, which I do not see the reason, it will cause re-installation and some pip dependencies error + warnings . @sywangyi is there a constraint that you fixed the version to 0.12.0 or can it be loose as above?

I just upgrade peft to the latest tag when I enable boft, ln_tuning and vera. If the peft test could pass , I am ok to loss as above. test is in https://github.com/huggingface/optimum-habana/blob/main/tests/test_peft_inference.py, and https://github.com/huggingface/optimum-habana/blob/main/tests/test_examples.py

@harborn
Copy link
Copy Markdown

harborn commented Sep 29, 2024

I see that tranformers==4.45.1 released, so any changes to upgrade again if use transformers==4.45.1 ?

return token_idx >= self.max_length
else:
is_done = input_ids.shape[-1] >= self.max_length
return create_return_const_tensor(input_ids, is_done)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@libinta MaxNewTokensCriteria no longer exists in transformers.

removed in this PR: https://github.com/huggingface/transformers/pull/32659/files#diff-6e63ae0764aa864afd5bae6d512677b99b5240cb98cb210190482bdbb6a85906

It was removed as it had plans for being deprecated:

"The class MaxNewTokensCriteria is deprecated and will be removed in v4.43. "
f"Please use MaxLengthCriteria(max_length={start_length + max_new_tokens}) "

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Oct 1, 2024

The code quality check failed, please run make style.

Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Oct 2, 2024

The code quality check failed, please run make style.

@regisss regisss marked this pull request as ready for review October 2, 2024 12:45
@regisss regisss requested a review from a user October 2, 2024 12:45
@regisss regisss requested a review from ZhaiFeiyue as a code owner October 2, 2024 12:45
@regisss regisss merged commit b73e250 into main Oct 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.