-
-
Notifications
You must be signed in to change notification settings - Fork 15.5k
[W8A8 Block Linear Refactor][2/N] Remove W8A8Fp8BlockLinearOp and adopt Fp8 block linear kernel selections. #33892
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
tjtanaa
merged 91 commits into
vllm-project:main
from
EmbeddedLLM:3n-block-scaled-rfc-pr
Apr 9, 2026
Merged
Changes from all commits
Commits
Show all changes
91 commits
Select commit
Hold shift + click to select a range
8a542b7
create initial block scaled mm kernels and a common base
maralbahari 0ebcf78
remove W8A8Fp8BlockLinearOp and adop mm kernel selection
maralbahari b76074c
remove W8A8Fp8BlockLinearOp from unit tests
maralbahari 3c7049e
Update vllm/model_executor/layers/quantization/kernels/base.py
maralbahari 08a893d
Update vllm/model_executor/layers/quantization/kernels/base.py
maralbahari 9847109
Update vllm/model_executor/layers/quantization/kernels/scaled_mm/aite…
maralbahari 5d58935
Update vllm/model_executor/layers/quantization/kernels/scaled_mm/cuda.py
maralbahari 9887678
Update vllm/model_executor/layers/quantization/kernels/scaled_mm/Bloc…
maralbahari 4b53675
fix pre-commit issues and typings
maralbahari acac7c1
imporve typing
maralbahari 61bfb5b
Merge remote-tracking branch 'origin/2n-block-scaled-rfc-pr' into 3n-…
maralbahari 3363c88
add missing kwargs for aiter fp8 block scaled mm func and return stat…
maralbahari 79951e2
Merge remote-tracking branch 'origin/2n-block-scaled-rfc-pr' into 3n-…
maralbahari 6465faa
fix f-string
maralbahari 5b3c2e1
Merge remote-tracking branch 'origin/2n-block-scaled-rfc-pr' into 3n-…
maralbahari 8dd23bd
Merge remote-tracking branch 'origin/main' into 3n-block-scaled-rfc-pr
maralbahari 320ced0
improve documenetation and fix typings in init_fp8_linear_kernel
maralbahari d0cd8a2
Merge remote-tracking branch 'origin/2n-block-scaled-rfc-pr' into 3n-…
maralbahari f555f75
Merge remote-tracking branch 'origin/main' into 2n-block-scaled-rfc-pr
maralbahari 614cef5
Merge remote-tracking branch 'origin/2n-block-scaled-rfc-pr' into 3n-…
maralbahari c43b6cd
Merge remote-tracking branch 'origin/main' into 2n-block-scaled-rfc-pr
maralbahari ce88d6e
Merge remote-tracking branch 'origin/2n-block-scaled-rfc-pr' into 3n-…
maralbahari 08d6a54
fix import error
maralbahari 4bc9347
fix imports
maralbahari d001db8
fix import
maralbahari de82fd1
use the same variable name for inpt quantization to follow scaled_mm
maralbahari 7a26e60
address PR comments
maralbahari 15c3d44
Merge remote-tracking branch 'origin/2n-block-scaled-rfc-pr' into 3n-…
maralbahari e5bbb6c
bugfixes
maralbahari cb46979
address PR comment
maralbahari f27d31a
bugfix compressed tensors
maralbahari d805795
fix unit tests
maralbahari bfcd522
add group_size check for cutlass and deep_gemm kernels and update fus…
maralbahari ca8b19d
fix wrong check on block fp8 cutlass can_implement
maralbahari 00a7522
fix potential bugs in deepgemm
maralbahari e97a479
bugfix reading correct weight_scale in block scaled mm linear
maralbahari 0236228
Merge remote-tracking branch 'origin/main' into 3n-block-scaled-rfc-pr
maralbahari 6ce17a9
fix pre-commit issue
maralbahari 2645041
Merge remote-tracking branch 'origin/main' into 3n-block-scaled-rfc-pr
maralbahari a3d7831
Merge branch 'main' into 3n-block-scaled-rfc-pr
tjtanaa ed5a54c
Merge branch 'main' into 3n-block-scaled-rfc-pr
tjtanaa c28dac9
initialize kernels in create_weights
maralbahari 2478541
Merge remote-tracking branch 'origin/main' into 3n-block-scaled-rfc-pr
maralbahari dba5697
fix fusion unit tests
maralbahari f0ca1e9
fix fusion unit test and online fp8 quant
maralbahari f595112
fix pre-commit error
maralbahari 55096ef
fix input_dtype
maralbahari 64df301
fix unittest
maralbahari a08f623
Merge remote-tracking branch 'origin/main' into 3n-block-scaled-rfc-pr
maralbahari 1d5c1b7
fix unit tests
maralbahari 98f215b
Merge remote-tracking branch 'origin/main' into 3n-block-scaled-rfc-pr
maralbahari 1b65c2e
fix unit test for test_modelopt
maralbahari f093d82
remove unused function.
maralbahari cbb0599
fix Quantization unit test
maralbahari 05b7cc9
attemp to fix marlin fp8 quant fp8
maralbahari 929d05d
Merge branch 'main' into 3n-block-scaled-rfc-pr
tjtanaa 8593412
Merge branch 'main' into 3n-block-scaled-rfc-pr
tjtanaa 5c73e37
Merge remote-tracking branch 'origin/main' into 3n-block-scaled-rfc-pr
maralbahari a07f484
Merge branch '3n-block-scaled-rfc-pr' of https://github.com/EmbeddedL…
maralbahari f3a1cd2
Merge remote-tracking branch 'origin/main' into 3n-block-scaled-rfc-pr
maralbahari c86b172
fix deepgemm ep2 accuracy issue
maralbahari cf0618c
Merge remote-tracking branch 'origin/main' into 3n-block-scaled-rfc-pr
maralbahari 7930f5a
Merge branch 'main' into 3n-block-scaled-rfc-pr
tjtanaa f01ba9e
Merge branch 'main' into 3n-block-scaled-rfc-pr
maralbahari ad95ceb
avoid calling is_flashinfer_fp8_blockscale_gemm_supported as class va…
maralbahari f5348e8
Merge branch '3n-block-scaled-rfc-pr' of https://github.com/EmbeddedL…
maralbahari f012fae
fix torch compile issue with torch.cond
maralbahari 9985919
fix torch.cond torch.compile errors
maralbahari 9520a25
Merge branch 'main' into 3n-block-scaled-rfc-pr
vllmellm ab00fb7
bugfix wrong input quantization
maralbahari 594fc5b
Merge remote-tracking branch 'origin/main' into 3n-block-scaled-rfc-pr
maralbahari ab2d3fc
fix torch.cond fx-graph break
maralbahari a3041f7
Merge remote-tracking branch 'origin/main' into 3n-block-scaled-rfc-pr
maralbahari 42f3334
fix mxfp8 test fail
maralbahari 697d747
clean code
maralbahari 0498d02
Merge remote-tracking branch 'origin/main' into 3n-block-scaled-rfc-pr
maralbahari af9ec82
fix new attention mla fusion unit test
maralbahari f79b1fd
fix wrong skip condition
maralbahari 5e76e75
fix mxfp8 unit test
maralbahari 24a4f25
maybe fix cutlass block scaled gemm
maralbahari 7e254a2
clean code
maralbahari 46ffd25
Merge remote-tracking branch 'origin/main' into 3n-block-scaled-rfc-pr
maralbahari 186fe8f
fix batch-invariant issue
maralbahari 8056522
Merge remote-tracking branch 'origin/main' into 3n-block-scaled-rfc-pr
maralbahari 9cb0ebf
fix online fp8
maralbahari bb9920f
fix pre-commit
maralbahari 884b952
fix mxfp8 linearmethod
maralbahari 8e6b3ab
fix pytorch compile test
maralbahari 1a88320
Merge branch 'main' into 3n-block-scaled-rfc-pr
tjtanaa 5b572ac
Merge branch 'main' into 3n-block-scaled-rfc-pr
tjtanaa 92ba677
Merge branch 'main' into 3n-block-scaled-rfc-pr
tjtanaa File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you know what is the equivalent of this
(None, GroupShape(1, 64)),test case for this PR?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tjtanaa added
(TritonFp8BlockScaledMMKernel, GroupShape(1, 64))for rocm similar to cuda.