Fix duplicated weights in fp8 quantization by Cyrilvallez · Pull Request #37667 · huggingface/transformers

Cyrilvallez · 2025-04-22T09:46:07Z

What does this PR do?

#35926 moved the fp8 params from buffers to parameters (which makes sense), but the quantizer itself was not updated, so the parameters would still be added as buffers leading to duplicated weights (they would live as both buffers and parameters).

Also, note that we SHOULD NEVER use set_module_tensor_to_device as it clears the cuda cache at each call which is really inefficient, and defeats all the purpose of cuda warmup we do in from_pretrained

cc @SunMarc @MekkCyber

github-actions · 2025-04-22T09:46:17Z

Hi 👋, thank you for opening this pull request! The pull request is converted to draft by default. The CI will be paused while the PR is in draft mode. When it is ready for review, please click the Ready for review button (at the bottom of the PR page). This will assign reviewers and trigger CI.

SunMarc

Nice catch ! Thanks for fixing !

MekkCyber

Sounds good thanks !

ArthurZucker

Nice catch!

* fix fp8 * Update quantizer_finegrained_fp8.py * fix circular import * Update quantizer_finegrained_fp8.py

fix fp8

0c372bf

github-actions Bot marked this pull request as draft April 22, 2025 09:46

Cyrilvallez added 2 commits April 22, 2025 11:47

Update quantizer_finegrained_fp8.py

c1c949a

fix circular import

f27e785

Cyrilvallez marked this pull request as ready for review April 22, 2025 09:54

Cyrilvallez changed the title ~~Fix fp8 quantization~~ Fix duplicated weights in fp8 quantization Apr 22, 2025

SunMarc approved these changes Apr 22, 2025

View reviewed changes

SunMarc requested a review from MekkCyber April 22, 2025 10:02

Update quantizer_finegrained_fp8.py

4e865a3

MekkCyber approved these changes Apr 22, 2025

View reviewed changes

Cyrilvallez merged commit 6614209 into main Apr 22, 2025
20 of 21 checks passed

Cyrilvallez deleted the fix-fp8 branch April 22, 2025 11:12

ArthurZucker reviewed Apr 28, 2025

View reviewed changes

zucchini-nlp pushed a commit to zucchini-nlp/transformers that referenced this pull request May 14, 2025

Fix duplicated weights in fp8 quantization (huggingface#37667)

29fc578

* fix fp8 * Update quantizer_finegrained_fp8.py * fix circular import * Update quantizer_finegrained_fp8.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix duplicated weights in fp8 quantization#37667

Fix duplicated weights in fp8 quantization#37667
Cyrilvallez merged 4 commits into
mainfrom
fix-fp8

Cyrilvallez commented Apr 22, 2025 •

edited

Loading

Uh oh!

github-actions Bot commented Apr 22, 2025

Uh oh!

SunMarc left a comment

Uh oh!

MekkCyber left a comment

Uh oh!

Uh oh!

ArthurZucker left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

Cyrilvallez commented Apr 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Uh oh!

github-actions Bot commented Apr 22, 2025

Uh oh!

SunMarc left a comment

Choose a reason for hiding this comment

Uh oh!

MekkCyber left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Cyrilvallez commented Apr 22, 2025 •

edited

Loading