Skip to content

Updates to docs; move mxfp8 examples#2673

Merged
dsikka merged 7 commits into
mainfrom
mxfp8_examples
Apr 30, 2026
Merged

Updates to docs; move mxfp8 examples#2673
dsikka merged 7 commits into
mainfrom
mxfp8_examples

Conversation

@dsikka
Copy link
Copy Markdown
Collaborator

@dsikka dsikka commented Apr 30, 2026

SUMMARY:

  • Move mxfp8 out of the experimental folder as fully supported in vLLM for CT
  • Update repo README with example and guide links
  • Update contributor docs with Developer Guide
  • Remove old sparse24 empty examples folder

@dsikka dsikka added the ready When a PR is ready for review label Apr 30, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 30, 2026

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: b760ae62-c3eb-45c1-a006-dc4692bdd8bc

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review

Walkthrough

Documentation updates across contributing and main README files. CONTRIBUTING.md adds a developer guide link for extending LLM Compressor. README.md reorganizes quantization sections with thematic example groupings and updates user guides. A deprecated sparse quantization example README is removed.

Changes

Cohort / File(s) Summary
Documentation Resources
CONTRIBUTING.md, README.md
CONTRIBUTING.md adds a link to the Developer Guide for extending LLM Compressor. README.md replaces "optimization selection" section with "step-by-step" quantization guidance, reorganizes "End-to-End Examples" into thematic subsections (Weight & Activation, Weight Only, Attention & KV Cache, Architectural Specific, Non-Uniform), updates example links from file-specific to directory-style paths, and replaces "User Guides" section with new entries covering big-model support, model-free quantization, and DDP quantization.
Deprecated Examples
examples/sparse_2of4_quantization_fp8/README.md
Removes deprecated Sparse2of4 example README due to lack of support in vLLM and LLM Compressor.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Suggested labels

enhancement, fp8, nvfp4, model_free_ptq

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately captures the main changes: documentation updates and moving mxfp8 examples, which aligns with the changeset modifications to CONTRIBUTING.md, README.md, and the sparse24 example folder deletion.
Description check ✅ Passed The description is directly related to the changeset, covering the documentation updates to README and CONTRIBUTING.md, mxfp8 examples handling, and removal of the sparse24 examples folder.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch mxfp8_examples

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown

👋 Hi! Thank you for contributing to llm-compressor. Please add the ready label when the PR is ready for review.

Note: This is required to complete the testing suite, please only add the label once the PR is code complete and local testing has been performed.

@mergify mergify Bot added the documentation Improvements or additions to documentation label Apr 30, 2026
@coderabbitai coderabbitai Bot added enhancement New feature or request fp8 For any issue / PR related to FP8 support nvfp4 For any PR / issue related to NVFP4 support model_free_ptq For any PR/issue related to the `model_free_ptq` pathway labels Apr 30, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the documentation by adding a link to the Developer Guide in CONTRIBUTING.md and significantly reorganizing the End-to-End Examples in the README.md into categorized sections like Weight and Activation, Weight Only, and Architecture-Specific quantization. It also updates several documentation links and removes a file related to unsupported Sparse24 models. The review feedback identifies several opportunities for improvement in the README, including correcting grammatical errors in the quantization guide description, ensuring consistent punctuation and capitalization (specifically for NVFP4), and maintaining uniform link formatting by removing trailing slashes.

Comment thread README.md Outdated
Comment thread README.md
Comment thread README.md Outdated
Comment thread README.md Outdated
Comment thread README.md Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@README.md`:
- Line 70: The sentence in README.md contains a grammar error: change "selecting
a quantization schemes" to either "selecting a quantization scheme" (singular)
or "selecting quantization schemes" (plural) in the line that reads "Please
refer to our [step-by-step compression
guide](https://docs.vllm.ai/projects/llm-compressor/en/latest/steps/choosing-model/)
for detailed information about selecting a quantization schemes, algorithm, and
their use cases." Update the phrase so the article ("a") matches singular
"scheme" or remove "a" to use the plural "schemes", and ensure the rest of the
clause ("algorithm, and their use cases") reads consistently with that choice.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 4433ad92-ca66-4cec-8c70-f000acc7f98d

📥 Commits

Reviewing files that changed from the base of the PR and between 7a52e9e and a99ef00.

📒 Files selected for processing (6)
  • CONTRIBUTING.md
  • README.md
  • examples/quantization_w8a8_mxfp8/autoround_qwen3_example.py
  • examples/quantization_w8a8_mxfp8/qwen3_example_w8a16_mxfp8.py
  • examples/quantization_w8a8_mxfp8/qwen3_example_w8a8_mxfp8.py
  • examples/sparse_2of4_quantization_fp8/README.md
💤 Files with no reviewable changes (1)
  • examples/sparse_2of4_quantization_fp8/README.md

Comment thread README.md Outdated
@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented Apr 30, 2026

The quality checks have failed. Please run make style and make quality under
the root directory to adddress the lint failures. You will need to install the
dev optional install to get the required linting packages:
https://github.com/vllm-project/llm-compressor/blob/main/CONTRIBUTING.md

@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented Apr 30, 2026

The quality checks have failed. Please run make style and make quality under
the root directory to adddress the lint failures. You will need to install the
dev optional install to get the required linting packages:
https://github.com/vllm-project/llm-compressor/blob/main/CONTRIBUTING.md

@mergify mergify Bot removed the quality-failed label Apr 30, 2026
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: Dipika Sikka <ds3822@columbia.edu>
@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented Apr 30, 2026

The quality checks have failed. Please run make style and make quality under
the root directory to adddress the lint failures. You will need to install the
dev optional install to get the required linting packages:
https://github.com/vllm-project/llm-compressor/blob/main/CONTRIBUTING.md

@mergify mergify Bot removed the quality-failed label Apr 30, 2026
@dsikka
Copy link
Copy Markdown
Collaborator Author

dsikka commented Apr 30, 2026

The links in the autoround readme are failing - I can just remove them?

Copy link
Copy Markdown
Collaborator

@brian-dellabetta brian-dellabetta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🥇

@dsikka dsikka enabled auto-merge (squash) April 30, 2026 19:55
@dsikka dsikka disabled auto-merge April 30, 2026 19:56
@dsikka dsikka merged commit 494430e into main Apr 30, 2026
12 of 14 checks passed
@dsikka dsikka deleted the mxfp8_examples branch April 30, 2026 19:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation enhancement New feature or request fp8 For any issue / PR related to FP8 support model_free_ptq For any PR/issue related to the `model_free_ptq` pathway nvfp4 For any PR / issue related to NVFP4 support ready When a PR is ready for review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants