Revise supported formats to reflect new quantization types#2678
Conversation
Updated the supported formats section to include new precisions and types for quantization. Signed-off-by: Dipika Sikka <dipikasikka1@gmail.com>
|
👋 Hi! Thank you for contributing to llm-compressor. Please add the ready label when the PR is ready for review. Note: This is required to complete the testing suite, please only add the label once the PR is code complete and local testing has been performed. |
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
WalkthroughUpdates the README documentation to reflect expanded quantization capabilities by renaming the "Supported Formats" section to "Supported Precisions and Types," introducing new precision variants (W4AFP8, NVFP4, MXFP4, MXFP8), adding mixed-precision combinations, explicitly documenting attention and KV cache quantization, and extending the "Supported Algorithms" list with rotation-based methods. Changes
Estimated code review effort🎯 1 (Trivial) | ⏱️ ~3 minutes Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request updates the README.md to reflect expanded support for various quantization precisions, types, and algorithms, including microscale formats and rotation-based methods. Review feedback suggests correcting the naming convention for 4-bit weight/8-bit activation quantization and specifying the supported formats for attention and KV cache quantization to improve clarity and consistency.
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Dipika Sikka <ds3822@columbia.edu>
Updated the supported formats section to include new precisions and types for quantization.