Skip to content

Make to_sparse_semi_structured_cutlass_sm9x ABI stable#3727

Merged
jerryzh168 merged 36 commits intomainfrom
gh/jerryzh168/27/head
Jan 30, 2026
Merged

Make to_sparse_semi_structured_cutlass_sm9x ABI stable#3727
jerryzh168 merged 36 commits intomainfrom
gh/jerryzh168/27/head

Conversation

@jerryzh168
Copy link
Copy Markdown
Contributor

@jerryzh168 jerryzh168 commented Jan 26, 2026

Stack from ghstack (oldest at bottom):

Summary:
part of #3516

This PR also changed

Float8DynamicActivationFloat8WeightConfig(
  version=2,
  packing_format=Float8PackingFormat.SPARSE_CUTLASS,
  granularity=PerRow(),
),

to only be available after torch 2.10.0, because we made the ops used ABI stable, and stable ABI APIs are only available after 2.10.0

Test Plan:
pip install -e . --no-build-isolation

Reviewers:

Subscribers:

Tasks:

Tags:

Summary:
part of #3516

Test Plan:
pip install -e . --no-build-isolation

seems doesn't work in B200, will try in H100

Reviewers:

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]
@pytorch-bot
Copy link
Copy Markdown

pytorch-bot bot commented Jan 26, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/3727

Note: Links to docs will display an error until the docs builds have been completed.

⏳ No Failures, 2 Pending

As of commit 82a9537 with merge base df0fde3 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

jerryzh168 added a commit that referenced this pull request Jan 26, 2026
Summary:
part of #3516

Test Plan:
pip install -e . --no-build-isolation

seems doesn't work in B200, will try in H100

Reviewers:

Subscribers:

Tasks:

Tags:

ghstack-source-id: 5d93562
Pull Request resolved: #3727
@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jan 26, 2026
@jerryzh168 jerryzh168 added the module: not user facing Use this tag if you don't want this PR to show up in release notes label Jan 26, 2026
Summary:
part of #3516

Test Plan:
pip install -e . --no-build-isolation

seems doesn't work in B200, will try in H100

Reviewers:

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]
jerryzh168 added a commit that referenced this pull request Jan 27, 2026
Summary:
part of #3516

Test Plan:
pip install -e . --no-build-isolation

seems doesn't work in B200, will try in H100

Reviewers:

Subscribers:

Tasks:

Tags:

ghstack-source-id: 49cc35b
Pull Request resolved: #3727
Summary:
part of #3516

Test Plan:
pip install -e . --no-build-isolation

seems doesn't work in B200, will try in H100

Reviewers:

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]
jerryzh168 added a commit that referenced this pull request Jan 27, 2026
Summary:
part of #3516

Test Plan:
pip install -e . --no-build-isolation

seems doesn't work in B200, will try in H100

Reviewers:

Subscribers:

Tasks:

Tags:

ghstack-source-id: cb749a0
Pull Request resolved: #3727
Summary:
part of #3516

Test Plan:
pip install -e . --no-build-isolation

seems doesn't work in B200, will try in H100

Reviewers:

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]
jerryzh168 added a commit that referenced this pull request Jan 27, 2026
Summary:
part of #3516

Test Plan:
pip install -e . --no-build-isolation

seems doesn't work in B200, will try in H100

Reviewers:

Subscribers:

Tasks:

Tags:

ghstack-source-id: f789fce
Pull Request resolved: #3727
Summary:
part of #3516

Test Plan:
pip install -e . --no-build-isolation

seems doesn't work in B200, will try in H100

Reviewers:

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]
jerryzh168 added a commit that referenced this pull request Jan 27, 2026
Summary:
part of #3516

Test Plan:
pip install -e . --no-build-isolation

seems doesn't work in B200, will try in H100

Reviewers:

Subscribers:

Tasks:

Tags:

ghstack-source-id: c010f6a
Pull Request resolved: #3727
Summary:
part of #3516

Test Plan:
pip install -e . --no-build-isolation

seems doesn't work in B200, will try in H100

Reviewers:

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]
jerryzh168 added a commit that referenced this pull request Jan 27, 2026
Summary:
part of #3516

Test Plan:
pip install -e . --no-build-isolation

seems doesn't work in B200, will try in H100

Reviewers:

Subscribers:

Tasks:

Tags:

ghstack-source-id: 61821ab
Pull Request resolved: #3727
…x ABI stable"

Summary:
part of #3516

Test Plan:
pip install -e . --no-build-isolation

seems doesn't work in B200, will try in H100

Reviewers:

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]
Summary:
part of #3516

Test Plan:
pip install -e . --no-build-isolation

seems doesn't work in B200, will try in H100

Reviewers:

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]
jerryzh168 added a commit that referenced this pull request Jan 28, 2026
Summary:
part of #3516

Test Plan:
pip install -e . --no-build-isolation

seems doesn't work in B200, will try in H100

Reviewers:

Subscribers:

Tasks:

Tags:

ghstack-source-id: bcae01a
Pull Request resolved: #3727
@jerryzh168
Copy link
Copy Markdown
Contributor Author

confirmed with Randy that it's fine that we change the availability of

Float8DynamicActivationFloat8WeightConfig(
                    version=2,
                    packing_format=Float8PackingFormat.SPARSE_CUTLASS,
                    granularity=PerRow(),
                ),

to be only available after 2.10.0 since they are using nightly / pytorch main

…x ABI stable"


Summary:
part of #3516

This PR also changed 

```
Float8DynamicActivationFloat8WeightConfig(
  version=2,
  packing_format=Float8PackingFormat.SPARSE_CUTLASS,
  granularity=PerRow(),
),
```

to only be available after torch 2.10.0, because we made the ops used ABI stable, and stable ABI APIs are only available after 2.10.0

Test Plan:
pip install -e . --no-build-isolation

Reviewers:

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]
Summary:
part of #3516

This PR also changed 

```
Float8DynamicActivationFloat8WeightConfig(
  version=2,
  packing_format=Float8PackingFormat.SPARSE_CUTLASS,
  granularity=PerRow(),
),
```

to only be available after torch 2.10.0, because we made the ops used ABI stable, and stable ABI APIs are only available after 2.10.0

Test Plan:
pip install -e . --no-build-isolation

Reviewers:

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]
jerryzh168 added a commit that referenced this pull request Jan 28, 2026
Summary:
part of #3516

Test Plan:
pip install -e . --no-build-isolation

seems doesn't work in B200, will try in H100

Reviewers:

Subscribers:

Tasks:

Tags:

ghstack-source-id: 4b3c106
Pull Request resolved: #3727
…x ABI stable"


Summary:
part of #3516

This PR also changed 

```
Float8DynamicActivationFloat8WeightConfig(
  version=2,
  packing_format=Float8PackingFormat.SPARSE_CUTLASS,
  granularity=PerRow(),
),
```

to only be available after torch 2.10.0, because we made the ops used ABI stable, and stable ABI APIs are only available after 2.10.0

Test Plan:
pip install -e . --no-build-isolation

Reviewers:

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]
Summary:
part of #3516

This PR also changed 

```
Float8DynamicActivationFloat8WeightConfig(
  version=2,
  packing_format=Float8PackingFormat.SPARSE_CUTLASS,
  granularity=PerRow(),
),
```

to only be available after torch 2.10.0, because we made the ops used ABI stable, and stable ABI APIs are only available after 2.10.0

Test Plan:
pip install -e . --no-build-isolation

Reviewers:

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]
jerryzh168 added a commit that referenced this pull request Jan 28, 2026
Summary:
part of #3516

Test Plan:
pip install -e . --no-build-isolation

seems doesn't work in B200, will try in H100

Reviewers:

Subscribers:

Tasks:

Tags:

ghstack-source-id: abed221
Pull Request resolved: #3727
…x ABI stable"


Summary:
part of #3516

This PR also changed 

```
Float8DynamicActivationFloat8WeightConfig(
  version=2,
  packing_format=Float8PackingFormat.SPARSE_CUTLASS,
  granularity=PerRow(),
),
```

to only be available after torch 2.10.0, because we made the ops used ABI stable, and stable ABI APIs are only available after 2.10.0

Test Plan:
pip install -e . --no-build-isolation

Reviewers:

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]
Summary:
part of #3516

This PR also changed 

```
Float8DynamicActivationFloat8WeightConfig(
  version=2,
  packing_format=Float8PackingFormat.SPARSE_CUTLASS,
  granularity=PerRow(),
),
```

to only be available after torch 2.10.0, because we made the ops used ABI stable, and stable ABI APIs are only available after 2.10.0

Test Plan:
pip install -e . --no-build-isolation

Reviewers:

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]
jerryzh168 added a commit that referenced this pull request Jan 28, 2026
Summary:
part of #3516

Test Plan:
pip install -e . --no-build-isolation

seems doesn't work in B200, will try in H100

Reviewers:

Subscribers:

Tasks:

Tags:

ghstack-source-id: 2e7f58b
Pull Request resolved: #3727
…x ABI stable"


Summary:
part of #3516

This PR also changed 

```
Float8DynamicActivationFloat8WeightConfig(
  version=2,
  packing_format=Float8PackingFormat.SPARSE_CUTLASS,
  granularity=PerRow(),
),
```

to only be available after torch 2.10.0, because we made the ops used ABI stable, and stable ABI APIs are only available after 2.10.0

Test Plan:
pip install -e . --no-build-isolation

Reviewers:

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]
Summary:
part of #3516

This PR also changed 

```
Float8DynamicActivationFloat8WeightConfig(
  version=2,
  packing_format=Float8PackingFormat.SPARSE_CUTLASS,
  granularity=PerRow(),
),
```

to only be available after torch 2.10.0, because we made the ops used ABI stable, and stable ABI APIs are only available after 2.10.0

Test Plan:
pip install -e . --no-build-isolation

Reviewers:

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]
…x ABI stable"


Summary:
part of #3516

This PR also changed 

```
Float8DynamicActivationFloat8WeightConfig(
  version=2,
  packing_format=Float8PackingFormat.SPARSE_CUTLASS,
  granularity=PerRow(),
),
```

to only be available after torch 2.10.0, because we made the ops used ABI stable, and stable ABI APIs are only available after 2.10.0

Test Plan:
pip install -e . --no-build-isolation

Reviewers:

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]
Summary:
part of #3516

This PR also changed 

```
Float8DynamicActivationFloat8WeightConfig(
  version=2,
  packing_format=Float8PackingFormat.SPARSE_CUTLASS,
  granularity=PerRow(),
),
```

to only be available after torch 2.10.0, because we made the ops used ABI stable, and stable ABI APIs are only available after 2.10.0

Test Plan:
pip install -e . --no-build-isolation

Reviewers:

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]
jerryzh168 added a commit that referenced this pull request Jan 28, 2026
Summary:
part of #3516

Test Plan:
pip install -e . --no-build-isolation

seems doesn't work in B200, will try in H100

Reviewers:

Subscribers:

Tasks:

Tags:

ghstack-source-id: d1673ba
Pull Request resolved: #3727
…x ABI stable"


Summary:
part of #3516

This PR also changed 

```
Float8DynamicActivationFloat8WeightConfig(
  version=2,
  packing_format=Float8PackingFormat.SPARSE_CUTLASS,
  granularity=PerRow(),
),
```

to only be available after torch 2.10.0, because we made the ops used ABI stable, and stable ABI APIs are only available after 2.10.0

Test Plan:
pip install -e . --no-build-isolation

Reviewers:

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]
Summary:
part of #3516

This PR also changed 

```
Float8DynamicActivationFloat8WeightConfig(
  version=2,
  packing_format=Float8PackingFormat.SPARSE_CUTLASS,
  granularity=PerRow(),
),
```

to only be available after torch 2.10.0, because we made the ops used ABI stable, and stable ABI APIs are only available after 2.10.0

Test Plan:
pip install -e . --no-build-isolation

Reviewers:

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]
jerryzh168 added a commit that referenced this pull request Jan 28, 2026
Summary:
part of #3516

Test Plan:
pip install -e . --no-build-isolation

seems doesn't work in B200, will try in H100

Reviewers:

Subscribers:

Tasks:

Tags:

ghstack-source-id: 04460a4
Pull Request resolved: #3727
…x ABI stable"


Summary:
part of #3516

This PR also changed 

```
Float8DynamicActivationFloat8WeightConfig(
  version=2,
  packing_format=Float8PackingFormat.SPARSE_CUTLASS,
  granularity=PerRow(),
),
```

to only be available after torch 2.10.0, because we made the ops used ABI stable, and stable ABI APIs are only available after 2.10.0

Test Plan:
pip install -e . --no-build-isolation

Reviewers:

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]
Summary:
part of #3516

This PR also changed 

```
Float8DynamicActivationFloat8WeightConfig(
  version=2,
  packing_format=Float8PackingFormat.SPARSE_CUTLASS,
  granularity=PerRow(),
),
```

to only be available after torch 2.10.0, because we made the ops used ABI stable, and stable ABI APIs are only available after 2.10.0

Test Plan:
pip install -e . --no-build-isolation

Reviewers:

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]
jerryzh168 added a commit that referenced this pull request Jan 28, 2026
Summary:
part of #3516

Test Plan:
pip install -e . --no-build-isolation

seems doesn't work in B200, will try in H100

Reviewers:

Subscribers:

Tasks:

Tags:

ghstack-source-id: 9fdfbd0
Pull Request resolved: #3727
@jerryzh168 jerryzh168 marked this pull request as ready for review January 28, 2026 23:51
@jerryzh168
Copy link
Copy Markdown
Contributor Author

jerryzh168 commented Jan 28, 2026

@janeyx99 @mikaylagawarecki @andrewor14 please take a look, I will remove the debug print before landing

Copy link
Copy Markdown
Contributor

@janeyx99 janeyx99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looked through the changes, lgtm overall, with some minor comments


#define OPERATOR_NAME "to_sparse_semi_structured_cutlass_sm9x"

// Macro for checking CUDA kernel launch errors (replacement for C10_CUDA_KERNEL_LAUNCH_CHECK)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does STD_CUDA_KERNEL_LAUNCH_CHECK not work here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let me try



# Check if torch version is at least 2.10.0 (for stable ABI support)
# util copied from torchao/utils.py
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not import the util

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the code that is installing torchao, so it is assuming torchao is not installed yet, you mean just import the file? is that too confusing?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we want to keep setup.py as simple as possible. Importing torchao/utils.py might have side effects outside the scope of this file

setup.py Outdated
)
ext_modules.append(
extension(
"torchao._C_cutlass_90a_stable",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you maintain an unstable version for torch 2.10 minus? It might be wise to enable users to build these kernels on an older torch still?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is just a temporary thing to keep this PR and #3725 separate. After both PRs are landed there will be a single torchao._C_cutlass_90a and it will be the stable ABI version

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah and we'll drop support for 2.10 minus afterwards

…x ABI stable"


Summary:
part of #3516

This PR also changed 

```
Float8DynamicActivationFloat8WeightConfig(
  version=2,
  packing_format=Float8PackingFormat.SPARSE_CUTLASS,
  granularity=PerRow(),
),
```

to only be available after torch 2.10.0, because we made the ops used ABI stable, and stable ABI APIs are only available after 2.10.0

Test Plan:
pip install -e . --no-build-isolation

Reviewers:

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]
Summary:
part of #3516

This PR also changed 

```
Float8DynamicActivationFloat8WeightConfig(
  version=2,
  packing_format=Float8PackingFormat.SPARSE_CUTLASS,
  granularity=PerRow(),
),
```

to only be available after torch 2.10.0, because we made the ops used ABI stable, and stable ABI APIs are only available after 2.10.0

Test Plan:
pip install -e . --no-build-isolation

Reviewers:

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]
jerryzh168 added a commit that referenced this pull request Jan 29, 2026
Summary:
part of #3516

Test Plan:
pip install -e . --no-build-isolation

seems doesn't work in B200, will try in H100

Reviewers:

Subscribers:

Tasks:

Tags:

ghstack-source-id: 919efe5
Pull Request resolved: #3727
…x ABI stable"


Summary:
part of #3516

This PR also changed 

```
Float8DynamicActivationFloat8WeightConfig(
  version=2,
  packing_format=Float8PackingFormat.SPARSE_CUTLASS,
  granularity=PerRow(),
),
```

to only be available after torch 2.10.0, because we made the ops used ABI stable, and stable ABI APIs are only available after 2.10.0

Test Plan:
pip install -e . --no-build-isolation

Reviewers:

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]
Summary:
part of #3516

This PR also changed 

```
Float8DynamicActivationFloat8WeightConfig(
  version=2,
  packing_format=Float8PackingFormat.SPARSE_CUTLASS,
  granularity=PerRow(),
),
```

to only be available after torch 2.10.0, because we made the ops used ABI stable, and stable ABI APIs are only available after 2.10.0

Test Plan:
pip install -e . --no-build-isolation

Reviewers:

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]
jerryzh168 added a commit that referenced this pull request Jan 29, 2026
Summary:
part of #3516

Test Plan:
pip install -e . --no-build-isolation

seems doesn't work in B200, will try in H100

Reviewers:

Subscribers:

Tasks:

Tags:

ghstack-source-id: 6e7a983
Pull Request resolved: #3727
Copy link
Copy Markdown
Contributor

@andrewor14 andrewor14 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, just one question I'll leave for Jane/Mikayla

// Get device properties using raw CUDA API.
int device_id = W.get_device();
cudaDeviceProp device_prop;
cudaError_t err = cudaGetDeviceProperties(&device_prop, device_id);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(copied from the FA3 stable ABI PR: Dao-AILab/flash-attention#1791)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the more complicated code in FA3 is needed for perf, so you cache the device properties and only call it once.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh I see, I guess I can follow the same pattern, maybe just move that function to common.h?

…x ABI stable"


Summary:
part of #3516

This PR also changed 

```
Float8DynamicActivationFloat8WeightConfig(
  version=2,
  packing_format=Float8PackingFormat.SPARSE_CUTLASS,
  granularity=PerRow(),
),
```

to only be available after torch 2.10.0, because we made the ops used ABI stable, and stable ABI APIs are only available after 2.10.0

Test Plan:
pip install -e . --no-build-isolation

Reviewers:

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]
Summary:
part of #3516

This PR also changed 

```
Float8DynamicActivationFloat8WeightConfig(
  version=2,
  packing_format=Float8PackingFormat.SPARSE_CUTLASS,
  granularity=PerRow(),
),
```

to only be available after torch 2.10.0, because we made the ops used ABI stable, and stable ABI APIs are only available after 2.10.0

Test Plan:
pip install -e . --no-build-isolation

Reviewers:

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]
jerryzh168 added a commit that referenced this pull request Jan 30, 2026
Summary:
part of #3516

Test Plan:
pip install -e . --no-build-isolation

seems doesn't work in B200, will try in H100

Reviewers:

Subscribers:

Tasks:

Tags:

ghstack-source-id: fcaeb29
Pull Request resolved: #3727
…x ABI stable"


Summary:
part of #3516

This PR also changed 

```
Float8DynamicActivationFloat8WeightConfig(
  version=2,
  packing_format=Float8PackingFormat.SPARSE_CUTLASS,
  granularity=PerRow(),
),
```

to only be available after torch 2.10.0, because we made the ops used ABI stable, and stable ABI APIs are only available after 2.10.0

Test Plan:
pip install -e . --no-build-isolation

Reviewers:

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]
Summary:
part of #3516

This PR also changed 

```
Float8DynamicActivationFloat8WeightConfig(
  version=2,
  packing_format=Float8PackingFormat.SPARSE_CUTLASS,
  granularity=PerRow(),
),
```

to only be available after torch 2.10.0, because we made the ops used ABI stable, and stable ABI APIs are only available after 2.10.0

Test Plan:
pip install -e . --no-build-isolation

Reviewers:

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]
jerryzh168 added a commit that referenced this pull request Jan 30, 2026
Summary:
part of #3516

Test Plan:
pip install -e . --no-build-isolation

seems doesn't work in B200, will try in H100

Reviewers:

Subscribers:

Tasks:

Tags:

ghstack-source-id: 256c2ea
Pull Request resolved: #3727
@jerryzh168 jerryzh168 changed the base branch from gh/jerryzh168/27/base to main January 30, 2026 21:15
@jerryzh168 jerryzh168 merged commit bedf37e into main Jan 30, 2026
54 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. module: not user facing Use this tag if you don't want this PR to show up in release notes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants