training acceleration via runtime semi-structured sparsity #184

jcaip · 2024-04-27T01:28:16Z

This PR adds in support for training acceleration, using runtime semi-structured sparsity kernels, which landed in core earlier: pytorch/pytorch#122350

This collects the necessary autograd functions, to support training and packages it up in a replacement nn.Linear modules, SemiSparseLinear, as well as a user API to swap out modules, swap_linear_with_semi_sparse_linear_.

It also adds in some benchmarking code from xformers in order to measure the speedup of this module when applied to DINO shapes.

We have a blog post coming out with more details about how this works.

Testing:

python test/sparsity/test_fast_sparse_training.py

Benchmarking:

python benchmarks/benchmark_semi_sparse.py

For VIT-L MLP shapes we see the following results:

[------------------------------------------------ mlpfwbw -------------------------------------------------]
                                  |   act24   |   dense   |   w24    |  s24_inp_sparsify24  |  s24_inp_clone
1 threads: -------------------------------------------------------------------------------------------------
      f16 (44160,1024,4096,1024)  |  11881.0  |  11534.3  |  9204.7  |        255.1         |      125.8

Times are in microseconds (us).

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

pytorch-bot · 2024-05-10T08:39:10Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/184

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 02446fa with merge base 8a4e693 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

msaroufim · 2024-06-03T20:07:45Z

torchao/sparsity/prototype/training/README.md

+# now you can run your normal training loop
+
+# if you need to swap back from semi_sparse linear to normal linear, we provide a utility function
+swap_semi_sparse_linear_with_linear(model)


What's up with having _ at the end of the name though?

Suggested change

swap_semi_sparse_linear_with_linear(model)

swap_semi_sparse_linear_with_linear_(model)

msaroufim · 2024-06-03T20:08:21Z

torchao/sparsity/prototype/training/README.md

@@ -0,0 +1,53 @@
+# Accelerated Sparse Training


why prototype? The API seems quite nice already if it module swaps on linear should be able to support most interesting models?

Will deprecating these APIs in the future be an issue then? I'm not sure that this swap_linear API is something that I want to commit to long term.

It's not an issue

msaroufim · 2024-06-03T20:10:01Z

torchao/sparsity/prototype/training/README.md

+    swap_semi_sparse_linear_with_linear_,
+)
+
+model = torch.nn.Sequential(torch.nn.Linear(64, 64)).cuda().to(torch.bfloat16)


does the example actually run faster? If not try to to have the minimal example that'll run faster and a way of printing that speedup to console. Also specify the supported SM for those speedups since if it's ampere+ then 3090 and 4090 should also benefit from your work

Updated, the example will run faster, and added compute capability limitations earlier in the readme, but for printing speedups - I don't think it's necessary as we have the benchmark script for that.

modifying the benchmarks for the example I see the following speedup (fw pass only):

[------------------------------------------------ mlpfw -------------------------------------------------] | act24 | dense | w24 | s24_inp_sparsify24 | s24_inp_clone 1 threads: ----------------------------------------------------------------------------------------------- f16 (44160,1024,4096,1024) | 4813.2 | 4031.0 | 3440.1 | 255.4 | 121.4

msaroufim · 2024-06-03T20:10:38Z

torchao/sparsity/prototype/training/__init__.py

+    """
+
+    def forward(self, x):
+        sparse_weight = semi_sparse_sparsify(self.weight, backend="cusparselt")


nit on name but how about semi_sparsify?

Sorry, this is a typo on my end, should be semi_structured_sparsify

msaroufim · 2024-06-03T20:11:24Z

torchao/sparsity/prototype/training/autograd.py

+class _SparsifyFunc(torch.autograd.Function):
+
+    @staticmethod
+    def forward(ctx, x: torch.Tensor, algo: str, backend: str):  # type: ignore[override]


could you put the supported backends in an enum?

msaroufim · 2024-06-03T20:14:26Z

torchao/sparsity/prototype/training/pointwise_ops.py

+                    )
+            else:
+                if (
+                    tensor.compressed_swizzled_bitmask is None


n00b q what's going on here?

msaroufim · 2024-06-03T20:16:16Z

torchao/sparsity/prototype/training/pointwise_ops.py

+        reference_sparse_tensor.compressed_swizzled_bitmask,
+    )
+
+# Add pointwise ops to the dispatch table


how do you decide what to do add here? It seems like you looked at what's most used in actual modeling code but my question would be why not all pointwise ops in pytorch?

I just added what was necessary for our experiments, for some pointwise ops, like mul you need to define the sparsification_like_args_list, so you cannot apply naively for all pointwise ops.

But it would make sense to add support for all the naive ones, I can add in a subsequent PR.

msaroufim · 2024-06-03T20:17:56Z

test/sparsity/test_fast_sparse_training.py

+        swap_semi_sparse_linear_with_linear_(model_c)
+        for name, mod in model_c.named_modules():
+            assert not isinstance(mod, SemiSparseLinear)
+


an interesting omission is no compile support but then you do have allow in graphs calls in your code so should we test compile support explciitly?

This should work with compile - I'll add a test

supriyar · 2024-06-06T02:21:21Z

@msaroufim @vkuzo I guess we missed it but this is the first technique we have in ao for training performance improvement :)

This PR adds in support for training acceleration, using runtime semi-structured sparsity kernels, which landed in core earlier: pytorch/pytorch#122350 This collects the necessary autograd functions, to support training and packages it up in a replacement `nn.Linear` modules, `SemiSparseLinear`, as well as a user API to swap out modules, `swap_linear_with_semi_sparse_linear_`. It also adds in some benchmarking code from xformers in order to measure the speedup of this module when applied to DINO shapes. We have a blog post coming out with more details about how this works. Testing: ``` python test/sparsity/test_fast_sparse_training.py ``` Benchmarking: ``` python benchmarks/benchmark_semi_sparse.py ``` For VIT-L MLP shapes we see the following results: ``` [------------------------------------------------ mlpfwbw -------------------------------------------------] | act24 | dense | w24 | s24_inp_sparsify24 | s24_inp_clone 1 threads: ------------------------------------------------------------------------------------------------- f16 (44160,1024,4096,1024) | 11881.0 | 11534.3 | 9204.7 | 255.1 | 125.8 Times are in microseconds (us). ```

* install gguf * moved model build to builder.py

initial

25ed84d

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 27, 2024

jcaip and others added 5 commits April 30, 2024 09:25

Merge branch 'main' into jcaip/fast-sparse-training

b909a39

wip

a40b42e

<Replace this line with a title. Use 1 line only, 67 chars or less>

ed5ac79

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

update

1017900

wip

2f8aaf6

jcaip added 16 commits May 15, 2024 09:05

updated benchmarks

876bfab

wip

61d4d26

added test

18546be

update

b3f7194

update

bd6bd0e

test

889bf1b

update folder structure

09e0e1a

clean up folder structure

a257e5c

updated benchmarks and files

02161a8

update ops

4fe6672

update benchmark utils

bbd94c4

cleaned up imports

6f87edc

fix typing

4682b0e

update

6810a60

updated readme

3d4da95

update

66beef0

jcaip changed the title ~~[wip] fast semi-sparse sparse training~~ training acceleration via runtime semi-structured sparsity Jun 3, 2024

jcaip added 2 commits June 3, 2024 04:29

update README

0eafbc5

update

8052e53

jcaip requested review from msaroufim, vkuzo and jerryzh168 June 3, 2024 11:32

jcaip and others added 7 commits June 3, 2024 07:34

Merge branch 'main' into jcaip/fast-sparse-training

4dbcd49

fix test

8fc6b89

update test

e91ce5d

update imports

44a172b

fix import

0b1e831

update to run only on nightlies

b43839b

added nightly onlt

6792415

msaroufim requested changes Jun 3, 2024

View reviewed changes

jcaip added 2 commits June 6, 2024 07:55

cr feedback

a4ac90b

added test

02446fa

jcaip requested a review from msaroufim June 6, 2024 15:56

msaroufim approved these changes Jun 6, 2024

View reviewed changes

jcaip merged commit d97ae74 into main Jun 6, 2024
13 checks passed

yanbing-j pushed a commit to yanbing-j/ao that referenced this pull request Dec 9, 2024

Builder (pytorch#184)

3344f0b

* install gguf * moved model build to builder.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

training acceleration via runtime semi-structured sparsity #184

training acceleration via runtime semi-structured sparsity #184

jcaip commented Apr 27, 2024 •

edited

Loading

pytorch-bot bot commented May 10, 2024 •

edited

Loading

msaroufim Jun 3, 2024

msaroufim Jun 3, 2024

jcaip Jun 5, 2024

msaroufim Jun 6, 2024

msaroufim Jun 3, 2024

jcaip Jun 6, 2024

msaroufim Jun 3, 2024

jcaip Jun 6, 2024

msaroufim Jun 3, 2024

msaroufim Jun 3, 2024

msaroufim Jun 3, 2024

jcaip Jun 6, 2024

msaroufim Jun 3, 2024

jcaip Jun 6, 2024

supriyar commented Jun 6, 2024

	swap_semi_sparse_linear_with_linear(model)
	swap_semi_sparse_linear_with_linear_(model)

training acceleration via runtime semi-structured sparsity #184

training acceleration via runtime semi-structured sparsity #184

Conversation

jcaip commented Apr 27, 2024 • edited Loading

pytorch-bot bot commented May 10, 2024 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/184

✅ No Failures

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

supriyar commented Jun 6, 2024

jcaip commented Apr 27, 2024 •

edited

Loading

pytorch-bot bot commented May 10, 2024 •

edited

Loading