Bug fix for drop path decay rate in swin transformer#34291
Bug fix for drop path decay rate in swin transformer#34291ArthurZucker merged 19 commits intohuggingface:mainfrom
Conversation
|
cc @molbap |
molbap
left a comment
There was a problem hiding this comment.
Hi and thanks for the PR! I think this is a work in progress? If sofeel free to pass the PR in draft mode or name it [WIP], and ping me again when you want another review! I think you correctly identified this bug :) when you're done, you can run make fixup to run the linter and make check doe quality happy in the CI
|
Hi, thanks for the reply. I was waiting for confirmation if this was indeed a bug or intended behaviour. I will fix the lint and other errors. |
|
In the #33974 , @ArthurZucker mentioned the messy initialisation of swinlayer class, i would not like to touch it in this pr. Personally i think the initialisation looks good. But if you think we should make it simpler, i m happy to tackle it in another pr |
This comment was marked as outdated.
This comment was marked as outdated.
|
The CI is green. Yey! ig now the pr is ready for review @molbap Btw i really loved the infra for CI/CD, tests run fast! Linting tools work amazingly! I also recently watched @ArthurZucker's pytorch conference talk and now i really understand the pain points mentioned in the video. Thank you guys for maintaining such a high-impact library! |
molbap
left a comment
There was a problem hiding this comment.
Thanks, it's cleaner! left a couple comments, let me know what you think
| input_resolution=input_resolution, | ||
| num_heads=num_heads, | ||
| shift_size=0 if (i % 2 == 0) else config.window_size // 2, | ||
| drop_path_rate=drop_path[i], |
There was a problem hiding this comment.
nice fix - and aligned with hiera & focalnet which also have a varying drop_path per layer iirc
molbap
left a comment
There was a problem hiding this comment.
LGTM - revamp of layers init to depend on (config, layer_idx) TBD in a follow-up PR! cc @ArthurZucker for final review
ArthurZucker
left a comment
There was a problem hiding this comment.
Nice and simple! Thanks 🤗
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
* potential bug fix for drop path * variable name change * forgot to rename the variables * back to original * modify dpr properly * check_copies auto fix * corresponsing swin2 changes * auto fix * linting * default value for drop_path_rate as 0.0 * Update src/transformers/models/glm/modeling_glm.py * maskformer fix * ruff format * changes made to tf code as well * lint --------- Co-authored-by: abhijit deo <167164474+deo-abhijit@users.noreply.github.com>
What does this PR do?
This PR fixes #33974 .
As i had mentioned in the issue, I feel that swin transformer implementation has incorrect implementation of stochastic depth decay.
According to the official implementation, drop_prob for every SwinLayer is different.
https://github.com/microsoft/Swin-Transformer/blob/f82860bfb5225915aca09c3227159ee9e1df874d/models/swin_transformer.py#L544
https://github.com/microsoft/Swin-Transformer/blob/f82860bfb5225915aca09c3227159ee9e1df874d/models/swin_transformer.py#L558
https://github.com/microsoft/Swin-Transformer/blob/main/models/swin_transformer.py#L397-#L408
But in transformers, we were using a constant value that is picked from the config file. I feel that implementations in transformers should be closer to the official ones. This also applies for the SwinV2 model. (and maybe swin2sr as well)
Please do look into this and let me know. Also i have changed the variable names as well. I am very bad at naming, so any suggestions for the argument names are welcome 😄
to it if that's the case. -> 'drop_path` argument for SwinStage class is unused. #33974
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.
@amyeroberts, @qubvel @ArthurZucker