Skip to content

Bug fix for drop path decay rate in swin transformer#34291

Merged
ArthurZucker merged 19 commits intohuggingface:mainfrom
abhi-glitchhg:swin_drop_path_bug
Oct 29, 2024
Merged

Bug fix for drop path decay rate in swin transformer#34291
ArthurZucker merged 19 commits intohuggingface:mainfrom
abhi-glitchhg:swin_drop_path_bug

Conversation

@abhi-glitchhg
Copy link
Contributor

@abhi-glitchhg abhi-glitchhg commented Oct 21, 2024

What does this PR do?

This PR fixes #33974 .

As i had mentioned in the issue, I feel that swin transformer implementation has incorrect implementation of stochastic depth decay.

According to the official implementation, drop_prob for every SwinLayer is different.

https://github.com/microsoft/Swin-Transformer/blob/f82860bfb5225915aca09c3227159ee9e1df874d/models/swin_transformer.py#L544

https://github.com/microsoft/Swin-Transformer/blob/f82860bfb5225915aca09c3227159ee9e1df874d/models/swin_transformer.py#L558

https://github.com/microsoft/Swin-Transformer/blob/main/models/swin_transformer.py#L397-#L408

But in transformers, we were using a constant value that is picked from the config file. I feel that implementations in transformers should be closer to the official ones. This also applies for the SwinV2 model. (and maybe swin2sr as well)

Please do look into this and let me know. Also i have changed the variable names as well. I am very bad at naming, so any suggestions for the argument names are welcome 😄

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.
@amyeroberts, @qubvel @ArthurZucker

@abhi-glitchhg abhi-glitchhg changed the title potential bug fix for drop path rate in swin transformer and easier initialisation of swinLayer class. potential bug fix for drop path decay rate in swin transformer and easier initialisation of swinLayer class. Oct 21, 2024
@abhi-glitchhg abhi-glitchhg changed the title potential bug fix for drop path decay rate in swin transformer and easier initialisation of swinLayer class. potential bug fix for drop path decay rate in swin transformer and simple initialisation of swinLayer class. Oct 22, 2024
@ArthurZucker ArthurZucker requested a review from molbap October 24, 2024 12:13
@ArthurZucker
Copy link
Collaborator

cc @molbap

Copy link
Contributor

@molbap molbap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi and thanks for the PR! I think this is a work in progress? If sofeel free to pass the PR in draft mode or name it [WIP], and ping me again when you want another review! I think you correctly identified this bug :) when you're done, you can run make fixup to run the linter and make check doe quality happy in the CI

@abhi-glitchhg
Copy link
Contributor Author

Hi, thanks for the reply. I was waiting for confirmation if this was indeed a bug or intended behaviour. I will fix the lint and other errors.
Thanks.
abhijit

@abhi-glitchhg abhi-glitchhg marked this pull request as draft October 24, 2024 15:39
@abhi-glitchhg abhi-glitchhg changed the title potential bug fix for drop path decay rate in swin transformer and simple initialisation of swinLayer class. potential bug fix for drop path decay rate in swin transformer Oct 24, 2024
@abhi-glitchhg
Copy link
Contributor Author

abhi-glitchhg commented Oct 24, 2024

In the #33974 , @ArthurZucker mentioned the messy initialisation of swinlayer class, i would not like to touch it in this pr.

Personally i think the initialisation looks good. But if you think we should make it simpler, i m happy to tackle it in another pr

@abhi-glitchhg

This comment was marked as outdated.

@abhi-glitchhg
Copy link
Contributor Author

The CI is green. Yey! ig now the pr is ready for review @molbap

Btw i really loved the infra for CI/CD, tests run fast! Linting tools work amazingly! I also recently watched @ArthurZucker's pytorch conference talk and now i really understand the pain points mentioned in the video.

Thank you guys for maintaining such a high-impact library!

@abhi-glitchhg abhi-glitchhg marked this pull request as ready for review October 25, 2024 17:34
@abhi-glitchhg abhi-glitchhg changed the title potential bug fix for drop path decay rate in swin transformer Bug fix for drop path decay rate in swin transformer Oct 25, 2024
Copy link
Contributor

@molbap molbap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, it's cleaner! left a couple comments, let me know what you think

input_resolution=input_resolution,
num_heads=num_heads,
shift_size=0 if (i % 2 == 0) else config.window_size // 2,
drop_path_rate=drop_path[i],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice fix - and aligned with hiera & focalnet which also have a varying drop_path per layer iirc

Copy link
Contributor

@molbap molbap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - revamp of layers init to depend on (config, layer_idx) TBD in a follow-up PR! cc @ArthurZucker for final review

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice and simple! Thanks 🤗

@ArthurZucker ArthurZucker merged commit 56c45d5 into huggingface:main Oct 29, 2024
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@abhi-glitchhg abhi-glitchhg deleted the swin_drop_path_bug branch October 31, 2024 12:33
BernardZach pushed a commit to BernardZach/transformers that referenced this pull request Dec 5, 2024
* potential bug fix for drop path

* variable name change

* forgot to rename the variables

* back to original

* modify dpr properly

* check_copies auto fix

* corresponsing swin2 changes

* auto fix

* linting

* default value for drop_path_rate as 0.0

* Update src/transformers/models/glm/modeling_glm.py

* maskformer fix

* ruff format

* changes made to tf code as well

* lint

---------

Co-authored-by: abhijit deo <167164474+deo-abhijit@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

'drop_path` argument for SwinStage class is unused.

5 participants