Fix dtype casting in swinv2 and swinv2sr to allow non-FP32 inference #31589

aliencaocao · 2024-06-25T13:19:11Z

What does this PR do?

The current implementation uses .float() in

transformers/src/transformers/models/swin2sr/modeling_swin2sr.py

Lines 286 to 287 in 0f67ba1

    
           relative_coords_h = torch.arange(-(self.window_size[0] - 1), self.window_size[0], dtype=torch.int64).float() 
        
           relative_coords_w = torch.arange(-(self.window_size[1] - 1), self.window_size[1], dtype=torch.int64).float()

which causes subsequent relative_coords_table to be always in torch.float32, not respecting whatever precision the other weights might be, e.g. torch.float16.

This PR adds a cast to the same dtype as the continuous_position_bias_mlp layer since relative_coords_table is being passed directly into the layer at

transformers/src/transformers/models/swin2sr/modeling_swin2sr.py

Line 349 in 0f67ba1

    
           relative_position_bias_table = self.continuous_position_bias_mlp(self.relative_coords_table).view(

Same issue & fix for swinv2

Prerequisite for #31342 image to image pipeline FP16 test to pass.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

@amyeroberts

amyeroberts

Thanks for fixing!

Could you also add some small tests like this one for vit?

src/transformers/models/swin2sr/modeling_swin2sr.py

src/transformers/models/swinv2/modeling_swinv2.py

Co-authored-by: amyeroberts <[email protected]>

aliencaocao · 2024-06-25T14:01:08Z

Added them, they pass locally

amyeroberts · 2024-06-26T11:01:03Z

@aliencaocao Great! Could you push an empty commit with the message: [run_slow] swin2sr, swinv2. I trust the tests are passing locally, but because of differences that can creep in because of hardware and env set-up, the logits can still be slightly different. So let's make sure the numbers match what's going to be running on the CI : )

amyeroberts

Thanks for fixing this!

HuggingFaceDocBuilderDev · 2024-06-26T13:08:53Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

aliencaocao · 2024-06-26T13:49:37Z

@amyeroberts need your approval for slow tests

aliencaocao · 2024-06-26T14:06:26Z

ugh it seems the specific gpu used indeed has numerical difference...
I ran the tests and got logit using RTX 3080Ti, torch2.3.1+cu121, nvidia 555.95 driver, Windows 10

amyeroberts · 2024-06-26T14:27:51Z

@aliencaocao As the change is simple, and look OK, updating the tests to use the results from the CI runs should be OK

aliencaocao · 2024-06-26T14:29:45Z

How do I get the CI outputs?
Do I have to print in CI?

amyeroberts · 2024-06-26T14:31:37Z

@aliencaocao Good question! Indeed, they're not part of the console output. Let me see if I can ssh in

amyeroberts · 2024-06-26T16:18:24Z

@aliencaocao Running on the runners, I get the following logits

swin2sr

tensor([[0.5454, 0.5542, 0.5640],
        [0.5518, 0.5562, 0.5649],
        [0.5391, 0.5425, 0.5620]], device='cuda:0', dtype=torch.float16)

swinv2

tensor([-0.3938, -0.4290,  0.0020], device='cuda:0', dtype=torch.float16)

aliencaocao · 2024-06-26T16:22:47Z

Thanks, triggered again

amyeroberts · 2024-06-26T17:46:40Z

@aliencaocao Thanks! All looks good - we can merge 🤗

Fix dtype casting in modeling_swin2sr to allow non-FP32 inference

d87a5e2

aliencaocao mentioned this pull request Jun 25, 2024

Allow FP16 or other precision inference for Pipelines #31342

Merged

5 tasks

aliencaocao added 2 commits June 25, 2024 21:20

Fix formattting

631174c

Fix for swinv2 too

2b83074

aliencaocao changed the title ~~Fix dtype casting in modeling_swin2sr to allow non-FP32 inference~~ Fix dtype casting in swinv2 and swinv2sr to allow non-FP32 inference Jun 25, 2024

amyeroberts reviewed Jun 25, 2024

View reviewed changes

src/transformers/models/swin2sr/modeling_swin2sr.py Outdated Show resolved Hide resolved

src/transformers/models/swinv2/modeling_swinv2.py Outdated Show resolved Hide resolved

aliencaocao and others added 3 commits June 25, 2024 21:37

Update src/transformers/models/swin2sr/modeling_swin2sr.py

bfe3ff8

Co-authored-by: amyeroberts <[email protected]>

Update src/transformers/models/swinv2/modeling_swinv2.py

3ddf1e9

Co-authored-by: amyeroberts <[email protected]>

Add FP16 tests for swin2sr and swinv2

606f1d3

amyeroberts added the run-slow label Jun 25, 2024

amyeroberts approved these changes Jun 26, 2024

View reviewed changes

[run_slow] swin2sr, swinv2

1a2a92e

[run_slow] swin2sr, swinv2

266a6e1

amyeroberts merged commit 1f9f57a into huggingface:main Jun 26, 2024

aliencaocao deleted the fix-swin2sr-dtype branch June 26, 2024 22:26

	relative_coords_h = torch.arange(-(self.window_size[0] - 1), self.window_size[0], dtype=torch.int64).float()
	relative_coords_w = torch.arange(-(self.window_size[1] - 1), self.window_size[1], dtype=torch.int64).float()

Fix dtype casting in swinv2 and swinv2sr to allow non-FP32 inference #31589

Fix dtype casting in swinv2 and swinv2sr to allow non-FP32 inference #31589

Uh oh!

Conversation

aliencaocao commented Jun 25, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Before submitting

Who can review?

Uh oh!

amyeroberts left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

aliencaocao commented Jun 25, 2024

Uh oh!

amyeroberts commented Jun 26, 2024

Uh oh!

amyeroberts left a comment

Choose a reason for hiding this comment

Uh oh!

HuggingFaceDocBuilderDev commented Jun 26, 2024

Uh oh!

aliencaocao commented Jun 26, 2024

Uh oh!

aliencaocao commented Jun 26, 2024

Uh oh!

amyeroberts commented Jun 26, 2024

Uh oh!

aliencaocao commented Jun 26, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

amyeroberts commented Jun 26, 2024

Uh oh!

amyeroberts commented Jun 26, 2024

Uh oh!

aliencaocao commented Jun 26, 2024

Uh oh!

amyeroberts commented Jun 26, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

aliencaocao commented Jun 25, 2024 •

edited

Loading

aliencaocao commented Jun 26, 2024 •

edited

Loading