Copy weights and preserve device for 8da4w QAT linear #211

andrewor14 · 2024-05-02T21:45:52Z

Summary: This fixes two correctness bugs. First, we never copied over the weights from the existing linear, so we would start from random weights even when loading from checkpoints. Second, we never preserved the device of the original linear. This is important for settings like FSDP, where we expect non-zero ranks to have their parameters on the meta device in order to initialize these parameters correctly.

Test Plan:
python test/quantization/test_qat.py -k test_qat_8da4w_quantizer python test/quantization/test_qat.py -k test_qat_8da4w_quantizer_load_state_dict_meta

Reviewers: jerryzh168, cpuhrsch

Subscribers: jerryzh168, cpuhrsch, supriyar

jerryzh168 · 2024-05-02T21:56:53Z

torchao/quantization/prototype/qat.py

@@ -47,6 +47,7 @@ def prepare(
            *args: Any,
            **kwargs: Any
        ) -> torch.nn.Module:
+            state_dict = model.state_dict()


oh OK, right, this function have to do create_quantized_state_dict() and load_state_dict() in the gpt-fast API, but I feel we could also change _replace_linear_8da4w to instantiate from the existing floating point module, that might be clearer

Yeah I agree, updated

jerryzh168 · 2024-05-02T21:57:10Z

torchao/quantization/prototype/qat.py

+                    )
+                    break
+            if should_load_state_dict:
+                model.load_state_dict(state_dict)


will this work if we use tensor subclasses?

jerryzh168 · 2024-05-02T22:04:46Z

torchao/quantization/prototype/qat.py

+            # on the meta device, in which case there is no need to
+            # load the state dict, and doing so will lead to an error
+            should_load_state_dict = True
+            for k, v in state_dict.items():


this can probably be simplified a bit:

should_load_state_dict = all(not v.is_meta for v in state_dict.values())

Summary: This fixes two correctness bugs. First, we never copied over the weights from the existing linear, so we would start from random weights even when loading from checkpoints. Second, we never preserved the device of the original linear. This is important for settings like FSDP, where we expect non-zero ranks to have their parameters on the meta device in order to initialize these parameters correctly. Test Plan: python test/quantization/test_qat.py -k test_qat_8da4w_quantizer python test/quantization/test_qat.py -k test_qat_8da4w_quantizer_meta_weights Reviewers: jerryzh168, cpuhrsch Subscribers: jerryzh168, cpuhrsch, supriyar

facebook-github-bot · 2024-05-02T22:14:11Z

@andrewor14 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

jerryzh168

sg, we'll refine the api/impl more when we move to tensor subclass

facebook-github-bot · 2024-05-03T17:40:36Z

@andrewor14 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

* Copy weights and preserve device for 8da4w QAT linear Summary: This fixes two correctness bugs. First, we never copied over the weights from the existing linear, so we would start from random weights even when loading from checkpoints. Second, we never preserved the device of the original linear. This is important for settings like FSDP, where we expect non-zero ranks to have their parameters on the meta device in order to initialize these parameters correctly. Test Plan: python test/quantization/test_qat.py -k test_qat_8da4w_quantizer python test/quantization/test_qat.py -k test_qat_8da4w_quantizer_meta_weights Reviewers: jerryzh168, cpuhrsch Subscribers: jerryzh168, cpuhrsch, supriyar * Update test_qat.py

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 2, 2024

andrewor14 requested a review from jerryzh168 May 2, 2024 21:45

jerryzh168 reviewed May 2, 2024

View reviewed changes

jerryzh168 approved these changes May 2, 2024

View reviewed changes

andrewor14 force-pushed the 8da4w_qat_state_dict branch from bac2301 to cd50d7a Compare May 2, 2024 22:10

andrewor14 requested a review from jerryzh168 May 2, 2024 22:11

jerryzh168 approved these changes May 2, 2024

View reviewed changes

Merge branch 'main' into 8da4w_qat_state_dict

2d67a7b

andrewor14 added 5 commits May 4, 2024 11:39

Merge branch 'main' into 8da4w_qat_state_dict

6a00ee2

Merge branch 'main' into 8da4w_qat_state_dict

f627292

Update test_qat.py

8ae41fd

Merge branch 'main' into 8da4w_qat_state_dict

f3f512e

Merge branch 'main' into 8da4w_qat_state_dict

de23a8b

andrewor14 merged commit ce78e79 into main May 6, 2024
15 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Copy weights and preserve device for 8da4w QAT linear #211

Copy weights and preserve device for 8da4w QAT linear #211

andrewor14 commented May 2, 2024

jerryzh168 May 2, 2024

andrewor14 May 2, 2024

jerryzh168 May 2, 2024

jerryzh168 May 2, 2024

facebook-github-bot commented May 2, 2024

jerryzh168 left a comment

facebook-github-bot commented May 3, 2024

Copy weights and preserve device for 8da4w QAT linear #211

Copy weights and preserve device for 8da4w QAT linear #211

Conversation

andrewor14 commented May 2, 2024

jerryzh168 May 2, 2024

Choose a reason for hiding this comment

andrewor14 May 2, 2024

Choose a reason for hiding this comment

jerryzh168 May 2, 2024

Choose a reason for hiding this comment

jerryzh168 May 2, 2024

Choose a reason for hiding this comment

facebook-github-bot commented May 2, 2024

jerryzh168 left a comment

Choose a reason for hiding this comment

facebook-github-bot commented May 3, 2024