-
Notifications
You must be signed in to change notification settings - Fork 3.3k
fix: relax padding for fp4 in weight loading #12497
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
fix: relax padding for fp4 in weight loading #12497
Conversation
Summary of ChangesHello @b8zhong, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request addresses a model loading issue related to NVFP4 quantization by modifying the validation logic for weight scales. It changes a rigid assertion about dimension divisibility into a more flexible warning, acknowledging that certain padding schemes might make the stricter check unnecessary. This change aims to prevent crashes during model loading for specific quantized models, improving robustness without compromising the core functionality. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request addresses an issue where an overly strict assertion on weight scale padding for FP4 quantization was causing model loading to fail. The change correctly relaxes this assertion to a warning, which is appropriate as the subsequent code handles the necessary padding. This is a good fix that improves model compatibility. I've added one suggestion to improve the clarity of the new warning message.
| logger.warning( | ||
| "NVFP4 %s_weight_scale K' not multiple of 4: shape=%s, group_size=%s", | ||
| name, | ||
| tuple(weight_scale.shape), | ||
| getattr(self.quant_config, "group_size", None), | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The warning message is a bit cryptic with K'. It would be more helpful for developers if the message explicitly stated which dimension is being referred to and what the consequence is (i.e., that padding will be applied).
logger.warning(
"NVFP4 %s_weight_scale last dim not a multiple of 4 (shape=%s, "
"group_size=%s). Padding will be applied.",
name,
tuple(weight_scale.shape),
getattr(self.quant_config, "group_size", None),
)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR relaxes the validation constraint for NVFP4 weight scale dimensions in MoE (Mixture of Experts) processing. The change replaces a hard assertion requiring the K' dimension (shape[2]) to be divisible by 16 with a warning when it's not divisible by 4, allowing models with different group sizes to load successfully.
- Replaces assertion with conditional warning for weight scale shape validation
- Changes divisibility requirement from 16 to 4 for the K' dimension
- Adds diagnostic information (shape and group_size) to the warning message
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Fix #12208
I didn't run into the original error, but acc
Since we do swizzle padding it may be unecessary.
I'm not sure the baseline result of GSM8K before. @Azure-Tang May you let me know if it looks reasonable