Make to_sparse_semi_structured_cutlass_sm9x ABI stable#3727
Make to_sparse_semi_structured_cutlass_sm9x ABI stable#3727jerryzh168 merged 36 commits intomainfrom
Conversation
Summary: part of #3516 Test Plan: pip install -e . --no-build-isolation seems doesn't work in B200, will try in H100 Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/3727
Note: Links to docs will display an error until the docs builds have been completed. ⏳ No Failures, 2 PendingAs of commit 82a9537 with merge base df0fde3 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
Summary: part of #3516 Test Plan: pip install -e . --no-build-isolation seems doesn't work in B200, will try in H100 Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
Summary: part of #3516 Test Plan: pip install -e . --no-build-isolation seems doesn't work in B200, will try in H100 Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
Summary: part of #3516 Test Plan: pip install -e . --no-build-isolation seems doesn't work in B200, will try in H100 Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
Summary: part of #3516 Test Plan: pip install -e . --no-build-isolation seems doesn't work in B200, will try in H100 Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
Summary: part of #3516 Test Plan: pip install -e . --no-build-isolation seems doesn't work in B200, will try in H100 Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
…x ABI stable" Summary: part of #3516 Test Plan: pip install -e . --no-build-isolation seems doesn't work in B200, will try in H100 Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
Summary: part of #3516 Test Plan: pip install -e . --no-build-isolation seems doesn't work in B200, will try in H100 Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
|
confirmed with Randy that it's fine that we change the availability of to be only available after 2.10.0 since they are using nightly / pytorch main |
…x ABI stable" Summary: part of #3516 This PR also changed ``` Float8DynamicActivationFloat8WeightConfig( version=2, packing_format=Float8PackingFormat.SPARSE_CUTLASS, granularity=PerRow(), ), ``` to only be available after torch 2.10.0, because we made the ops used ABI stable, and stable ABI APIs are only available after 2.10.0 Test Plan: pip install -e . --no-build-isolation Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
Summary: part of #3516 This PR also changed ``` Float8DynamicActivationFloat8WeightConfig( version=2, packing_format=Float8PackingFormat.SPARSE_CUTLASS, granularity=PerRow(), ), ``` to only be available after torch 2.10.0, because we made the ops used ABI stable, and stable ABI APIs are only available after 2.10.0 Test Plan: pip install -e . --no-build-isolation Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
…x ABI stable" Summary: part of #3516 This PR also changed ``` Float8DynamicActivationFloat8WeightConfig( version=2, packing_format=Float8PackingFormat.SPARSE_CUTLASS, granularity=PerRow(), ), ``` to only be available after torch 2.10.0, because we made the ops used ABI stable, and stable ABI APIs are only available after 2.10.0 Test Plan: pip install -e . --no-build-isolation Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
Summary: part of #3516 This PR also changed ``` Float8DynamicActivationFloat8WeightConfig( version=2, packing_format=Float8PackingFormat.SPARSE_CUTLASS, granularity=PerRow(), ), ``` to only be available after torch 2.10.0, because we made the ops used ABI stable, and stable ABI APIs are only available after 2.10.0 Test Plan: pip install -e . --no-build-isolation Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
…x ABI stable" Summary: part of #3516 This PR also changed ``` Float8DynamicActivationFloat8WeightConfig( version=2, packing_format=Float8PackingFormat.SPARSE_CUTLASS, granularity=PerRow(), ), ``` to only be available after torch 2.10.0, because we made the ops used ABI stable, and stable ABI APIs are only available after 2.10.0 Test Plan: pip install -e . --no-build-isolation Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
Summary: part of #3516 This PR also changed ``` Float8DynamicActivationFloat8WeightConfig( version=2, packing_format=Float8PackingFormat.SPARSE_CUTLASS, granularity=PerRow(), ), ``` to only be available after torch 2.10.0, because we made the ops used ABI stable, and stable ABI APIs are only available after 2.10.0 Test Plan: pip install -e . --no-build-isolation Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
…x ABI stable" Summary: part of #3516 This PR also changed ``` Float8DynamicActivationFloat8WeightConfig( version=2, packing_format=Float8PackingFormat.SPARSE_CUTLASS, granularity=PerRow(), ), ``` to only be available after torch 2.10.0, because we made the ops used ABI stable, and stable ABI APIs are only available after 2.10.0 Test Plan: pip install -e . --no-build-isolation Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
Summary: part of #3516 This PR also changed ``` Float8DynamicActivationFloat8WeightConfig( version=2, packing_format=Float8PackingFormat.SPARSE_CUTLASS, granularity=PerRow(), ), ``` to only be available after torch 2.10.0, because we made the ops used ABI stable, and stable ABI APIs are only available after 2.10.0 Test Plan: pip install -e . --no-build-isolation Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
…x ABI stable" Summary: part of #3516 This PR also changed ``` Float8DynamicActivationFloat8WeightConfig( version=2, packing_format=Float8PackingFormat.SPARSE_CUTLASS, granularity=PerRow(), ), ``` to only be available after torch 2.10.0, because we made the ops used ABI stable, and stable ABI APIs are only available after 2.10.0 Test Plan: pip install -e . --no-build-isolation Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
Summary: part of #3516 This PR also changed ``` Float8DynamicActivationFloat8WeightConfig( version=2, packing_format=Float8PackingFormat.SPARSE_CUTLASS, granularity=PerRow(), ), ``` to only be available after torch 2.10.0, because we made the ops used ABI stable, and stable ABI APIs are only available after 2.10.0 Test Plan: pip install -e . --no-build-isolation Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
…x ABI stable" Summary: part of #3516 This PR also changed ``` Float8DynamicActivationFloat8WeightConfig( version=2, packing_format=Float8PackingFormat.SPARSE_CUTLASS, granularity=PerRow(), ), ``` to only be available after torch 2.10.0, because we made the ops used ABI stable, and stable ABI APIs are only available after 2.10.0 Test Plan: pip install -e . --no-build-isolation Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
Summary: part of #3516 This PR also changed ``` Float8DynamicActivationFloat8WeightConfig( version=2, packing_format=Float8PackingFormat.SPARSE_CUTLASS, granularity=PerRow(), ), ``` to only be available after torch 2.10.0, because we made the ops used ABI stable, and stable ABI APIs are only available after 2.10.0 Test Plan: pip install -e . --no-build-isolation Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
…x ABI stable" Summary: part of #3516 This PR also changed ``` Float8DynamicActivationFloat8WeightConfig( version=2, packing_format=Float8PackingFormat.SPARSE_CUTLASS, granularity=PerRow(), ), ``` to only be available after torch 2.10.0, because we made the ops used ABI stable, and stable ABI APIs are only available after 2.10.0 Test Plan: pip install -e . --no-build-isolation Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
Summary: part of #3516 This PR also changed ``` Float8DynamicActivationFloat8WeightConfig( version=2, packing_format=Float8PackingFormat.SPARSE_CUTLASS, granularity=PerRow(), ), ``` to only be available after torch 2.10.0, because we made the ops used ABI stable, and stable ABI APIs are only available after 2.10.0 Test Plan: pip install -e . --no-build-isolation Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
|
@janeyx99 @mikaylagawarecki @andrewor14 please take a look, I will remove the debug print before landing |
janeyx99
left a comment
There was a problem hiding this comment.
looked through the changes, lgtm overall, with some minor comments
|
|
||
| #define OPERATOR_NAME "to_sparse_semi_structured_cutlass_sm9x" | ||
|
|
||
| // Macro for checking CUDA kernel launch errors (replacement for C10_CUDA_KERNEL_LAUNCH_CHECK) |
There was a problem hiding this comment.
does STD_CUDA_KERNEL_LAUNCH_CHECK not work here?
|
|
||
|
|
||
| # Check if torch version is at least 2.10.0 (for stable ABI support) | ||
| # util copied from torchao/utils.py |
There was a problem hiding this comment.
this is the code that is installing torchao, so it is assuming torchao is not installed yet, you mean just import the file? is that too confusing?
There was a problem hiding this comment.
I think we want to keep setup.py as simple as possible. Importing torchao/utils.py might have side effects outside the scope of this file
setup.py
Outdated
| ) | ||
| ext_modules.append( | ||
| extension( | ||
| "torchao._C_cutlass_90a_stable", |
There was a problem hiding this comment.
Would you maintain an unstable version for torch 2.10 minus? It might be wise to enable users to build these kernels on an older torch still?
There was a problem hiding this comment.
I think this is just a temporary thing to keep this PR and #3725 separate. After both PRs are landed there will be a single torchao._C_cutlass_90a and it will be the stable ABI version
There was a problem hiding this comment.
yeah and we'll drop support for 2.10 minus afterwards
.../csrc/cuda/to_sparse_semi_structured_cutlass_sm9x/to_sparse_semi_structured_cutlass_sm9x.cuh
Outdated
Show resolved
Hide resolved
…x ABI stable" Summary: part of #3516 This PR also changed ``` Float8DynamicActivationFloat8WeightConfig( version=2, packing_format=Float8PackingFormat.SPARSE_CUTLASS, granularity=PerRow(), ), ``` to only be available after torch 2.10.0, because we made the ops used ABI stable, and stable ABI APIs are only available after 2.10.0 Test Plan: pip install -e . --no-build-isolation Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
Summary: part of #3516 This PR also changed ``` Float8DynamicActivationFloat8WeightConfig( version=2, packing_format=Float8PackingFormat.SPARSE_CUTLASS, granularity=PerRow(), ), ``` to only be available after torch 2.10.0, because we made the ops used ABI stable, and stable ABI APIs are only available after 2.10.0 Test Plan: pip install -e . --no-build-isolation Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
…x ABI stable" Summary: part of #3516 This PR also changed ``` Float8DynamicActivationFloat8WeightConfig( version=2, packing_format=Float8PackingFormat.SPARSE_CUTLASS, granularity=PerRow(), ), ``` to only be available after torch 2.10.0, because we made the ops used ABI stable, and stable ABI APIs are only available after 2.10.0 Test Plan: pip install -e . --no-build-isolation Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
Summary: part of #3516 This PR also changed ``` Float8DynamicActivationFloat8WeightConfig( version=2, packing_format=Float8PackingFormat.SPARSE_CUTLASS, granularity=PerRow(), ), ``` to only be available after torch 2.10.0, because we made the ops used ABI stable, and stable ABI APIs are only available after 2.10.0 Test Plan: pip install -e . --no-build-isolation Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
andrewor14
left a comment
There was a problem hiding this comment.
Looks great, just one question I'll leave for Jane/Mikayla
| // Get device properties using raw CUDA API. | ||
| int device_id = W.get_device(); | ||
| cudaDeviceProp device_prop; | ||
| cudaError_t err = cudaGetDeviceProperties(&device_prop, device_id); |
There was a problem hiding this comment.
@janeyx99 @mikaylagawarecki is this right? I had to do something more complicated in my PR: https://github.com/pytorch/ao/pull/3725/changes#r2746685516
There was a problem hiding this comment.
(copied from the FA3 stable ABI PR: Dao-AILab/flash-attention#1791)
There was a problem hiding this comment.
I believe the more complicated code in FA3 is needed for perf, so you cache the device properties and only call it once.
There was a problem hiding this comment.
oh I see, I guess I can follow the same pattern, maybe just move that function to common.h?
…x ABI stable" Summary: part of #3516 This PR also changed ``` Float8DynamicActivationFloat8WeightConfig( version=2, packing_format=Float8PackingFormat.SPARSE_CUTLASS, granularity=PerRow(), ), ``` to only be available after torch 2.10.0, because we made the ops used ABI stable, and stable ABI APIs are only available after 2.10.0 Test Plan: pip install -e . --no-build-isolation Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
Summary: part of #3516 This PR also changed ``` Float8DynamicActivationFloat8WeightConfig( version=2, packing_format=Float8PackingFormat.SPARSE_CUTLASS, granularity=PerRow(), ), ``` to only be available after torch 2.10.0, because we made the ops used ABI stable, and stable ABI APIs are only available after 2.10.0 Test Plan: pip install -e . --no-build-isolation Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
…x ABI stable" Summary: part of #3516 This PR also changed ``` Float8DynamicActivationFloat8WeightConfig( version=2, packing_format=Float8PackingFormat.SPARSE_CUTLASS, granularity=PerRow(), ), ``` to only be available after torch 2.10.0, because we made the ops used ABI stable, and stable ABI APIs are only available after 2.10.0 Test Plan: pip install -e . --no-build-isolation Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
Summary: part of #3516 This PR also changed ``` Float8DynamicActivationFloat8WeightConfig( version=2, packing_format=Float8PackingFormat.SPARSE_CUTLASS, granularity=PerRow(), ), ``` to only be available after torch 2.10.0, because we made the ops used ABI stable, and stable ABI APIs are only available after 2.10.0 Test Plan: pip install -e . --no-build-isolation Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
Stack from ghstack (oldest at bottom):
Summary:
part of #3516
This PR also changed
to only be available after torch 2.10.0, because we made the ops used ABI stable, and stable ABI APIs are only available after 2.10.0
Test Plan:
pip install -e . --no-build-isolation
Reviewers:
Subscribers:
Tasks:
Tags: