Skip to content

[matmul kernel] [nvfp4] Use flex ctx out scale - to support tensor scale with nvfp4 output#9854

Merged
tristan-oai merged 6 commits into
triton-lang:mainfrom
tristan-oai:tristan/mx-tensor-scale
Apr 18, 2026
Merged

[matmul kernel] [nvfp4] Use flex ctx out scale - to support tensor scale with nvfp4 output#9854
tristan-oai merged 6 commits into
triton-lang:mainfrom
tristan-oai:tristan/mx-tensor-scale

Conversation

@tristan-oai
Copy link
Copy Markdown
Collaborator

@tristan-oai tristan-oai commented Mar 26, 2026

nvfp4 has a tensor-wide scale. This PR adds support for this scale when the matmul output needs to be quantized to nvfp4. We use flex ctx to carry the scale.

New contributor declaration

  • I am not making a trivial change, such as fixing a typo in a comment.

  • I have written a PR description following these
    rules.

  • I have run pre-commit run --from-ref origin/main --to-ref HEAD.

  • Select one of the following.

    • I have added tests.
      • /test for lit tests
      • /unittest for C++ tests
      • /python/test for end-to-end tests
    • This PR does not need a test because FILL THIS IN.
  • Select one of the following.

    • I have not added any lit tests.
    • The lit tests I have added follow these best practices,
      including the "tests should be minimal" section. (Usually running Python code
      and using the instructions it generates is not minimal.)

@tristan-oai tristan-oai force-pushed the tristan/mx-tensor-scale branch from 2f50e42 to 4131a8b Compare March 26, 2026 18:05
@tristan-oai tristan-oai force-pushed the tristan/mx-tensor-scale branch from d8bf707 to 27ec1e1 Compare March 26, 2026 20:12
@tristan-oai tristan-oai changed the title use flex ctx out scale - to support tensor scale with nvfp4 [matmul kernel] [nvfp4] Use flex ctx out scale - to support tensor scale with nvfp4 output Apr 17, 2026
@tristan-oai tristan-oai marked this pull request as ready for review April 17, 2026 19:05
@tristan-oai tristan-oai requested a review from ptillet as a code owner April 17, 2026 19:05
@ThomasRaoux ThomasRaoux requested a review from aeng-openai April 17, 2026 19:08
@tristan-oai tristan-oai merged commit 3123400 into triton-lang:main Apr 18, 2026
9 checks passed
raymondtay pushed a commit to raymondtay/triton that referenced this pull request Apr 18, 2026
…ale with nvfp4 output (triton-lang#9854)

nvfp4 has a tensor-wide scale. This PR adds support for this scale when
the matmul output needs to be quantized to nvfp4. We use flex ctx to
carry the scale.


<!---
The core Triton is a small number of people, and we receive many PRs
(thank
you!).  To help us review your code more quickly, **if you are a new
contributor (less than 3 PRs merged) we ask that you complete the
following
tasks and include the filled-out checklist in your PR description.**

Complete the following tasks before sending your PR, and replace `[ ]`
with
`[x]` to indicate you have done them.
-->

# New contributor declaration
- [ ] I am not making a trivial change, such as fixing a typo in a
comment.

- [ ] I have written a PR description following these
  [rules](https://cbea.ms/git-commit/#why-not-how).

- [ ] I have run `pre-commit run --from-ref origin/main --to-ref HEAD`.

- Select one of the following.
  - [ ] I have added tests.
    - `/test` for `lit` tests
    - `/unittest` for C++ tests
    - `/python/test` for end-to-end tests
  - [ ] This PR does not need a test because `FILL THIS IN`.

- Select one of the following.
  - [ ] I have not added any `lit` tests.
- [ ] The `lit` tests I have added follow these [best
practices](https://mlir.llvm.org/getting_started/TestingGuide/#filecheck-best-practices),
including the "tests should be minimal" section. (Usually running Python
code
    and using the instructions it generates is not minimal.)
bingyizh233 pushed a commit to bingyizh233/triton that referenced this pull request Apr 20, 2026
…ale with nvfp4 output (triton-lang#9854)

nvfp4 has a tensor-wide scale. This PR adds support for this scale when
the matmul output needs to be quantized to nvfp4. We use flex ctx to
carry the scale.


<!---
The core Triton is a small number of people, and we receive many PRs
(thank
you!).  To help us review your code more quickly, **if you are a new
contributor (less than 3 PRs merged) we ask that you complete the
following
tasks and include the filled-out checklist in your PR description.**

Complete the following tasks before sending your PR, and replace `[ ]`
with
`[x]` to indicate you have done them.
-->

# New contributor declaration
- [ ] I am not making a trivial change, such as fixing a typo in a
comment.

- [ ] I have written a PR description following these
  [rules](https://cbea.ms/git-commit/#why-not-how).

- [ ] I have run `pre-commit run --from-ref origin/main --to-ref HEAD`.

- Select one of the following.
  - [ ] I have added tests.
    - `/test` for `lit` tests
    - `/unittest` for C++ tests
    - `/python/test` for end-to-end tests
  - [ ] This PR does not need a test because `FILL THIS IN`.

- Select one of the following.
  - [ ] I have not added any `lit` tests.
- [ ] The `lit` tests I have added follow these [best
practices](https://mlir.llvm.org/getting_started/TestingGuide/#filecheck-best-practices),
including the "tests should be minimal" section. (Usually running Python
code
    and using the instructions it generates is not minimal.)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants