Upgrade to Transformers v4.45#1359
Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
|
@regisss should we loosen the requirements for https://github.com/huggingface/optimum-habana/blob/8043d2cef69edc9eae6c7282bbb7fa41f268e5b6/examples/language-modeling/requirements.txt#L7C1-L7C15 to |
We can do it. Is the current constraint too strict for some examples? |
|
when will this PR merge? |
When all our internal tests are validated. Probably next week but guarantee it. |
I have pulled this patch and run my workflow based on this patch to inference llama-3.2-11b and llama-3.2-90b, it can work, but with lower performance. |
Lower performance means lower throughput than Llama 3.1? |
Some of the dependencies like |
| loss = None | ||
| if labels is not None: | ||
| # Upcast to float if we need to compute the loss to avoid potential precision issues | ||
| logits = logits.float() |
There was a problem hiding this comment.
why this is need to be logit float?
| # This `clone` call is needed to avoid recapturing cuda graphs with `torch.compile`'s `mode="reduce-overhead`, as otherwise the | ||
| # input `position_ids` would have various stride during the decoding. Here, simply using `.contiguous()` is not sufficient as in | ||
| # the batch size = 1 case, `position_ids` is already contiguous but with varying stride which retriggers a capture. | ||
| model_inputs = {"input_ids": input_ids.clone(memory_format=torch.contiguous_format), "inputs_embeds": None} |
| position_ids = position_ids[:, -1] | ||
|
|
||
| # This `clone` call is needed to avoid recapturing cuda graphs with `torch.compile`'s `mode="reduce-overhead`, as otherwise the input `position_ids` would have various stride during the decoding. Here, simply using `.contiguous()` is not sufficient as in the batch size = 1 case, `position_ids` is already contiguous but with varying stride which retriggers a capture. | ||
| position_ids = position_ids.clone(memory_format=torch.contiguous_format) |
There was a problem hiding this comment.
i think clone causes perf issue, and we dont need this if it's not torch,compile,right?
| model_inputs = {"inputs_embeds": inputs_embeds} | ||
| else: | ||
| model_inputs = {"input_ids": input_ids.contiguous()} | ||
| model_inputs = {"input_ids": input_ids.clone(memory_format=torch.contiguous_format)} |
| loss = None | ||
| if labels is not None: | ||
| # Upcast to float if we need to compute the loss to avoid potential precision issues | ||
| logits = logits.float() |
There was a problem hiding this comment.
It seems the .float() of line 585 will be removed in Transformers v4.46. Do you see any perf degradation?
I just upgrade peft to the latest tag when I enable boft, ln_tuning and vera. If the peft test could pass , I am ok to loss as above. test is in https://github.com/huggingface/optimum-habana/blob/main/tests/test_peft_inference.py, and https://github.com/huggingface/optimum-habana/blob/main/tests/test_examples.py |
|
I see that tranformers==4.45.1 released, so any changes to upgrade again if use transformers==4.45.1 ? |
| return token_idx >= self.max_length | ||
| else: | ||
| is_done = input_ids.shape[-1] >= self.max_length | ||
| return create_return_const_tensor(input_ids, is_done) |
There was a problem hiding this comment.
@libinta MaxNewTokensCriteria no longer exists in transformers.
removed in this PR: https://github.com/huggingface/transformers/pull/32659/files#diff-6e63ae0764aa864afd5bae6d512677b99b5240cb98cb210190482bdbb6a85906
It was removed as it had plans for being deprecated:
"The class MaxNewTokensCriteria is deprecated and will be removed in v4.43. "
f"Please use MaxLengthCriteria(max_length={start_length + max_new_tokens}) "
|
The code quality check failed, please run |
Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>
|
The code quality check failed, please run |
What does this PR do?
As per title.
Before submitting