Skip to content

[Bugfix] Fix Sparse24 Compressed Tensors models#33446

Merged
vllm-bot merged 7 commits intovllm-project:mainfrom
neuralmagic:kylesayrs/check-sparse-only-models
Feb 12, 2026
Merged

[Bugfix] Fix Sparse24 Compressed Tensors models#33446
vllm-bot merged 7 commits intovllm-project:mainfrom
neuralmagic:kylesayrs/check-sparse-only-models

Conversation

@kylesayrs
Copy link
Contributor

@kylesayrs kylesayrs commented Jan 30, 2026

Purpose

  • Fix sparse24 compressed tensors models

Background

#30141 introduced a check against config_groups in the compressed tensors config. However, the config_groups field is not populated in the case of sparse24 models.

This regression should have been caught by tests/quantization/test_compressed_tensors.py::test_compressed_tensors_2of4_sparse, however this test is skipped by the CI because the CI runs with CC<90. A broader discussion should be had as to whether quantization tests should be run with higher CC, likely accompanied with a pruning of existing quantization tests.

Changes

  • Check for existence of config_groups field before filtering out Attention groups
  • Change sparse24 CC logic to check for exactly hopper (sparse24 kernels are not forwards compatible)

Testing

  • test_compressed_tensors_2of4_sparse and test_compressed_tensors_2of4_sparse_compressed previously failed but now passes on hopper machine
  • Tested sanity with nm-testing/TinyLlama-1.1B-Chat-v1.0-sparse2of4_only-e2e on hopper machine

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
@mergify mergify bot added the bug Something isn't working label Jan 30, 2026
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses a regression that caused a KeyError when loading sparse24 compressed tensor models. The fix involves adding a check to ensure the config_groups key exists in the configuration dictionary before attempting to process it. This is a correct and necessary change that prevents the crash. The implementation is clear and directly solves the issue. I approve these changes.

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
@kylesayrs
Copy link
Contributor Author

@dsikka @tlrmchlsmth

@mgoin
Copy link
Member

mgoin commented Jan 30, 2026

Can we just remove 24 sparsity instead?

@github-project-automation github-project-automation bot moved this to Ready in NVIDIA Feb 9, 2026
@mgoin mgoin added the ready ONLY add when PR is ready to merge/full CI is needed label Feb 9, 2026
@vllm-bot vllm-bot merged commit e9cd691 into vllm-project:main Feb 12, 2026
104 of 109 checks passed
@github-project-automation github-project-automation bot moved this from Ready to Done in NVIDIA Feb 12, 2026
warichet pushed a commit to warichet/vllm that referenced this pull request Feb 12, 2026
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
@kylesayrs kylesayrs deleted the kylesayrs/check-sparse-only-models branch February 16, 2026 15:43
eldarkurtic pushed a commit to eldarkurtic/vllm that referenced this pull request Feb 19, 2026
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Signed-off-by: Eldar Kurtic <research@neuralmagic.com>
llsj14 pushed a commit to llsj14/vllm that referenced this pull request Mar 1, 2026
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
tunglinwood pushed a commit to tunglinwood/vllm that referenced this pull request Mar 4, 2026
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working nvidia ready ONLY add when PR is ready to merge/full CI is needed

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

3 participants