Skip to content

Fix duplicated weights in fp8 quantization#37667

Merged
Cyrilvallez merged 4 commits into
mainfrom
fix-fp8
Apr 22, 2025
Merged

Fix duplicated weights in fp8 quantization#37667
Cyrilvallez merged 4 commits into
mainfrom
fix-fp8

Conversation

@Cyrilvallez
Copy link
Copy Markdown
Member

@Cyrilvallez Cyrilvallez commented Apr 22, 2025

What does this PR do?

#35926 moved the fp8 params from buffers to parameters (which makes sense), but the quantizer itself was not updated, so the parameters would still be added as buffers leading to duplicated weights (they would live as both buffers and parameters).

Also, note that we SHOULD NEVER use set_module_tensor_to_device as it clears the cuda cache at each call which is really inefficient, and defeats all the purpose of cuda warmup we do in from_pretrained

cc @SunMarc @MekkCyber

@github-actions
Copy link
Copy Markdown
Contributor

Hi 👋, thank you for opening this pull request! The pull request is converted to draft by default. The CI will be paused while the PR is in draft mode. When it is ready for review, please click the Ready for review button (at the bottom of the PR page). This will assign reviewers and trigger CI.

@github-actions github-actions Bot marked this pull request as draft April 22, 2025 09:46
@Cyrilvallez Cyrilvallez marked this pull request as ready for review April 22, 2025 09:54
@Cyrilvallez Cyrilvallez changed the title Fix fp8 quantization Fix duplicated weights in fp8 quantization Apr 22, 2025
Copy link
Copy Markdown
Member

@SunMarc SunMarc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch ! Thanks for fixing !

@SunMarc SunMarc requested a review from MekkCyber April 22, 2025 10:02
Copy link
Copy Markdown
Contributor

@MekkCyber MekkCyber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good thanks !

@Cyrilvallez Cyrilvallez merged commit 6614209 into main Apr 22, 2025
20 of 21 checks passed
@Cyrilvallez Cyrilvallez deleted the fix-fp8 branch April 22, 2025 11:12
Copy link
Copy Markdown
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch!

zucchini-nlp pushed a commit to zucchini-nlp/transformers that referenced this pull request May 14, 2025
* fix fp8

* Update quantizer_finegrained_fp8.py

* fix circular import

* Update quantizer_finegrained_fp8.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants