Revise supported formats to reflect new quantization types by dsikka · Pull Request #2678 · vllm-project/llm-compressor

dsikka · 2026-04-30T22:21:17Z

Updated the supported formats section to include new precisions and types for quantization.

Updated the supported formats section to include new precisions and types for quantization. Signed-off-by: Dipika Sikka <dipikasikka1@gmail.com>

github-actions · 2026-04-30T22:21:27Z

👋 Hi! Thank you for contributing to llm-compressor. Please add the ready label when the PR is ready for review.

Note: This is required to complete the testing suite, please only add the label once the PR is code complete and local testing has been performed.

coderabbitai · 2026-04-30T22:21:37Z

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 7a13ba46-0e90-42a2-82bd-1357ebdba5ae

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

Walkthrough

Updates the README documentation to reflect expanded quantization capabilities by renaming the "Supported Formats" section to "Supported Precisions and Types," introducing new precision variants (W4AFP8, NVFP4, MXFP4, MXFP8), adding mixed-precision combinations, explicitly documenting attention and KV cache quantization, and extending the "Supported Algorithms" list with rotation-based methods.

Changes

Cohort / File(s)	Summary
README Documentation `README.md`	Updated quantization capability documentation with renamed section ("Supported Precisions and Types"), new precision names and microscale variants (W4AFP8, NVFP4, MXFP4, MXFP8), mixed-precision entries (MXFP4A16, NVFP4A16), explicit attention and KV-cache quantization callouts, and rotation-based algorithms (SpinQuant, QuIP).

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~3 minutes

Possibly related PRs

Updates to docs; move mxfp8 examples #2673: Updates README quantization documentation with overlapping precision names and quantization types (MXFP4/MXFP8, NVFP4, attention/KV-cache quantization).

Suggested labels

enhancement, nvfp4, fp8, w4a16

Suggested reviewers

brian-dellabetta
kylesayrs

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately reflects the main change in the pull request, which updates the README documentation to reflect new quantization types and expanded precision names.
Description check	✅ Passed	The description relates to the changeset by mentioning the update to the supported formats section with new precisions and types, though it contains placeholder text indicating incomplete documentation.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch dsikka-update-precision

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Code Review

This pull request updates the README.md to reflect expanded support for various quantization precisions, types, and algorithms, including microscale formats and rotation-based methods. Review feedback suggests correcting the naming convention for 4-bit weight/8-bit activation quantization and specifying the supported formats for attention and KV cache quantization to improve clarity and consistency.

brian-dellabetta

nice!

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Dipika Sikka <ds3822@columbia.edu>

Revise supported formats to reflect new quantization types

5fd66bd

Updated the supported formats section to include new precisions and types for quantization. Signed-off-by: Dipika Sikka <dipikasikka1@gmail.com>

dsikka added the ready When a PR is ready for review label Apr 30, 2026

mergify Bot added the documentation Improvements or additions to documentation label Apr 30, 2026

coderabbitai Bot added enhancement New feature or request fp8 For any issue / PR related to FP8 support nvfp4 For any PR / issue related to NVFP4 support w4a16 labels Apr 30, 2026

gemini-code-assist Bot reviewed Apr 30, 2026

View reviewed changes

Comment thread README.md

Comment thread README.md Outdated

brian-dellabetta approved these changes Apr 30, 2026

View reviewed changes

Comment thread README.md Outdated

Update README.md

8f787e4

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Dipika Sikka <ds3822@columbia.edu>

dsikka merged commit 76b28ce into main May 1, 2026
11 of 13 checks passed

dsikka deleted the dsikka-update-precision branch May 1, 2026 14:18

coderabbitai Bot mentioned this pull request May 6, 2026

Fix AutoRound ignore-layer metadata handling and add Qwen3-30B to mxfp8 example. #2687

Closed

coderabbitai Bot mentioned this pull request May 13, 2026

[Docs] update/add docs related to MX formats #2708

Merged

coderabbitai Bot mentioned this pull request May 31, 2026

docs: fix all the autoround example links #2774

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Revise supported formats to reflect new quantization types#2678

Revise supported formats to reflect new quantization types#2678
dsikka merged 2 commits into
mainfrom
dsikka-update-precision

dsikka commented Apr 30, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Apr 30, 2026

Uh oh!

coderabbitai Bot commented Apr 30, 2026 •

edited

Loading

Review skipped

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

brian-dellabetta left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

dsikka commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Apr 30, 2026

Uh oh!

coderabbitai Bot commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

brian-dellabetta left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

dsikka commented Apr 30, 2026 •

edited

Loading

coderabbitai Bot commented Apr 30, 2026 •

edited

Loading