Upgrade to Transformers v4.45 by regisss · Pull Request #1359 · huggingface/optimum-habana

regisss · 2024-09-25T19:48:33Z

What does this PR do?

As per title.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you make sure to update the documentation with your changes?
Did you write any new necessary tests?

HuggingFaceDocBuilderDev · 2024-09-25T19:52:49Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

#1360)

yafshar · 2024-09-26T20:00:59Z

@regisss should we loosen the requirements for https://github.com/huggingface/optimum-habana/blob/8043d2cef69edc9eae6c7282bbb7fa41f268e5b6/examples/language-modeling/requirements.txt#L7C1-L7C15 to peft>=0.11.1,<=0.13.0

regisss · 2024-09-26T21:35:23Z

@regisss should we loosen the requirements for https://github.com/huggingface/optimum-habana/blob/8043d2cef69edc9eae6c7282bbb7fa41f268e5b6/examples/language-modeling/requirements.txt#L7C1-L7C15 to peft>=0.11.1,<=0.13.0

We can do it. Is the current constraint too strict for some examples?

harborn · 2024-09-27T03:33:59Z

when will this PR merge?
currently, I need this PR to use transformers==4.45.0

regisss · 2024-09-27T08:27:13Z

when will this PR merge? currently, I need this PR to use transformers==4.45.0

When all our internal tests are validated. Probably next week but guarantee it.
In the meantime, feel free to install this branch directly.

harborn · 2024-09-27T08:39:02Z

when will this PR merge? currently, I need this PR to use transformers==4.45.0

When all our internal tests are validated. Probably next week but guarantee it. In the meantime, feel free to install this branch directly.

I have pulled this patch and run my workflow based on this patch to inference llama-3.2-11b and llama-3.2-90b, it can work, but with lower performance.

regisss · 2024-09-27T09:31:01Z

when will this PR merge? currently, I need this PR to use transformers==4.45.0

When all our internal tests are validated. Probably next week but guarantee it. In the meantime, feel free to install this branch directly.

I have pulled this patch and run my workflow based on this patch to inference llama-3.2-11b and llama-3.2-90b, it can work, but with lower performance.

Lower performance means lower throughput than Llama 3.1?

yafshar · 2024-09-27T14:16:10Z

@regisss should we loosen the requirements for https://github.com/huggingface/optimum-habana/blob/8043d2cef69edc9eae6c7282bbb7fa41f268e5b6/examples/language-modeling/requirements.txt#L7C1-L7C15 to peft>=0.11.1,<=0.13.0

We can do it. Is the current constraint too strict for some examples?

Some of the dependencies like neural-compressor will cause a different version installation and when you are in the language modeling example with the hard constraint for peft, which I do not see the reason, it will cause re-installation and some pip dependencies error + warnings . @sywangyi is there a constraint that you fixed the version to 0.12.0 or can it be loose as above?

libinta · 2024-09-27T16:59:52Z

        loss = None
        if labels is not None:
+            # Upcast to float if we need to compute the loss to avoid potential precision issues
+            logits = logits.float()


why this is need to be logit float?

libinta · 2024-09-27T17:18:36Z

+            # This `clone` call is needed to avoid recapturing cuda graphs with `torch.compile`'s  `mode="reduce-overhead`, as otherwise the
+            # input `position_ids` would have various stride during the decoding. Here, simply using `.contiguous()` is not sufficient as in
+            # the batch size = 1 case, `position_ids` is already contiguous but with varying stride which retriggers a capture.
+            model_inputs = {"input_ids": input_ids.clone(memory_format=torch.contiguous_format), "inputs_embeds": None}


add if lazy_mode check

libinta · 2024-09-27T17:19:22Z

                    position_ids = position_ids[:, -1]

+                # This `clone` call is needed to avoid recapturing cuda graphs with `torch.compile`'s  `mode="reduce-overhead`, as otherwise the input `position_ids` would have various stride during the decoding. Here, simply using `.contiguous()` is not sufficient as in the batch size = 1 case, `position_ids` is already contiguous but with varying stride which retriggers a capture.
+                position_ids = position_ids.clone(memory_format=torch.contiguous_format)


add lazy_mode check

i think clone causes perf issue, and we dont need this if it's not torch,compile,right?

libinta · 2024-09-27T17:19:42Z

            model_inputs = {"inputs_embeds": inputs_embeds}
        else:
-            model_inputs = {"input_ids": input_ids.contiguous()}
+            model_inputs = {"input_ids": input_ids.clone(memory_format=torch.contiguous_format)}


libinta · 2024-09-27T17:21:51Z

        loss = None
        if labels is not None:
+            # Upcast to float if we need to compute the loss to avoid potential precision issues
+            logits = logits.float()


dup of line585?

It seems the .float() of line 585 will be removed in Transformers v4.46. Do you see any perf degradation?

sywangyi · 2024-09-28T13:39:55Z

@regisss should we loosen the requirements for https://github.com/huggingface/optimum-habana/blob/8043d2cef69edc9eae6c7282bbb7fa41f268e5b6/examples/language-modeling/requirements.txt#L7C1-L7C15 to peft>=0.11.1,<=0.13.0

We can do it. Is the current constraint too strict for some examples?

Some of the dependencies like neural-compressor will cause a different version installation and when you are in the language modeling example with the hard constraint for peft, which I do not see the reason, it will cause re-installation and some pip dependencies error + warnings . @sywangyi is there a constraint that you fixed the version to 0.12.0 or can it be loose as above?

I just upgrade peft to the latest tag when I enable boft, ln_tuning and vera. If the peft test could pass , I am ok to loss as above. test is in https://github.com/huggingface/optimum-habana/blob/main/tests/test_peft_inference.py, and https://github.com/huggingface/optimum-habana/blob/main/tests/test_examples.py

harborn · 2024-09-29T03:27:59Z

I see that tranformers==4.45.1 released, so any changes to upgrade again if use transformers==4.45.1 ?

ssarkar2 · 2024-09-30T03:37:18Z

-        return token_idx >= self.max_length
-    else:
-        is_done = input_ids.shape[-1] >= self.max_length
-        return create_return_const_tensor(input_ids, is_done)


@libinta MaxNewTokensCriteria no longer exists in transformers.

removed in this PR: https://github.com/huggingface/transformers/pull/32659/files#diff-6e63ae0764aa864afd5bae6d512677b99b5240cb98cb210190482bdbb6a85906

It was removed as it had plans for being deprecated:

"The class MaxNewTokensCriteria is deprecated and will be removed in v4.43. "
f"Please use MaxLengthCriteria(max_length={start_length + max_new_tokens}) "

github-actions · 2024-10-01T14:50:03Z

The code quality check failed, please run make style.

Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>

github-actions · 2024-10-02T12:09:02Z

The code quality check failed, please run make style.

regisss added 16 commits September 2, 2024 16:51

Upgrade to commit 74e19e81e2a23809af192532b9b0e7ea202be6f2

5e88ce7

Merge branch 'main' into transformers_future

43bc4eb

Add specific commit in setup.py

8eea643

Upgrade to commit e48e5f1f13e05380e24f4f31f5fee07aa6f959eb

a7be363

Merge branch 'main' into transformers_future

f0b909a

Fix default cache

d99f18f

Merge branch 'main' into transformers_future

39b7a76

Merge branch 'main' into transformers_future

da66ecf

Merge branch 'main' into transformers_future

5547767

Merge branch 'main' into transformers_future

bf89e41

Merge branch 'main' into transformers_future

98b0da5

Upgrade to commit 238b13478df209ab534f2195a397dc64a3930883

47ad03c

Fix

94c23ba

Upgrade to v4.45.0

c19dedd

Merge branch 'main' into transformers_future

c12fd7e

Fix

fc399fa

regisss mentioned this pull request Sep 25, 2024

transformer 4.45 -> causing image_to_text failures for gaudi attribute error #1357

Closed

4 tasks

jiminha and others added 4 commits September 26, 2024 09:51

Add bias to gptj (#1363)

9216159

Switch roberta from sdpa to eager attn (#1361)

679365a

Update bloom attention forward reshape follwing the transformer change (

1abd6ee

#1360)

Workaround for Llava/Llava-next

8043d2c

libinta reviewed Sep 27, 2024

View reviewed changes

Fix reshape error in mamba (#1369)

047e7ff

ssarkar2 reviewed Sep 30, 2024

View reviewed changes

regisss and others added 4 commits September 30, 2024 18:08

Merge branch 'main' into transformers_future

f89e03b

Merge branch 'main' into transformers_future

2ae546a

Fix contrastive search

1b8a3f7

Fix local variable 'image_features' referenced before assignment (#1383)

2332afb

Use model.generation_config instead of model.config (#1384)

f62ecde

Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>

Make style

a8fb8ac

regisss marked this pull request as ready for review October 2, 2024 12:45

regisss requested review from bhargaveede, mandy-li and vivekgoe as code owners October 2, 2024 12:45

regisss requested a review from a user October 2, 2024 12:45

regisss requested a review from ZhaiFeiyue as a code owner October 2, 2024 12:45

regisss merged commit b73e250 into main Oct 2, 2024

zongwave mentioned this pull request Oct 8, 2024

fix llama model text generation error #1402

Merged

3 tasks

Conversation

regisss commented Sep 25, 2024

What does this PR do?

Before submitting

Uh oh!

HuggingFaceDocBuilderDev commented Sep 25, 2024

Uh oh!

yafshar commented Sep 26, 2024

Uh oh!

regisss commented Sep 26, 2024

Uh oh!

harborn commented Sep 27, 2024

Uh oh!

regisss commented Sep 27, 2024

Uh oh!

harborn commented Sep 27, 2024

Uh oh!

regisss commented Sep 27, 2024

Uh oh!

yafshar commented Sep 27, 2024 • edited by sywangyi Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sywangyi commented Sep 28, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

harborn commented Sep 29, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Oct 1, 2024

Uh oh!

github-actions Bot commented Oct 2, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

12 participants

yafshar commented Sep 27, 2024 •

edited by sywangyi

Loading

sywangyi commented Sep 28, 2024 •

edited

Loading