Skip to content

Conversation

@xin3he
Copy link
Contributor

@xin3he xin3he commented Oct 16, 2025

User description

Type of Change

bug fix


PR Type

Enhancement, Bug fix


Description

  • Added mem_per_param_scale and enable_torch_compile arguments

  • Updated dtype handling for uNVFP4 and NVFP4+

  • Fixed regression issues in dtype mapping and layer configuration

  • Updated README to include enable_torch_compile in example command


Diagram Walkthrough

flowchart LR
  A["Add mem_per_param_scale"] -- "New argument" --> B["Update dtype handling"]
  B -- "Fix regression" --> C["Add enable_torch_compile"]
  C -- "Update README" --> D["Enhance quantization"]
Loading

File Walkthrough

Relevant files
Enhancement
quantize.py
Add mem_per_param_scale and enable_torch_compile                 

examples/pytorch/nlp/huggingface_models/language-modeling/quantization/mix-precision/quantize.py

  • Added mem_per_param_scale and enable_torch_compile arguments
  • Updated dtype handling for uNVFP4 and NVFP4+
  • Fixed regression issues in dtype mapping and layer configuration
+27/-27 
Documentation
README.md
Update README with enable_torch_compile                                   

examples/pytorch/nlp/huggingface_models/language-modeling/quantization/mix-precision/README.md

  • Updated example command to include enable_torch_compile
+3/-3     

Signed-off-by: He, Xin3 <[email protected]>
Signed-off-by: He, Xin3 <[email protected]>
Signed-off-by: He, Xin3 <[email protected]>
@PRAgent4INC
Copy link
Collaborator

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 2 🔵🔵⚪⚪⚪
🧪 No relevant tests
🔒 No security concerns identified
⚡ Recommended focus areas for review

Command Consistency

The command for running the script with deepspeed has different node specifications (localhost:0,1,2,3 vs localhost:4,5,6,7). Ensure these are consistent or clarify the reason for the difference.

deepspeed --include="localhost:0,1,2,3" --master_port=29500 quantize.py  \
    --model_name_or_path meta-llama/Llama-3.3-70B-Instruct/ \

@PRAgent4INC
Copy link
Collaborator

PR Code Suggestions ✨

@chensuyue chensuyue added this to the 3.6 milestone Oct 17, 2025
@xin3he xin3he requested review from XuehaoSun and thuang6 October 20, 2025 03:43
Copy link
Contributor

@thuang6 thuang6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better change AR dependency to pip released v0.8 version after AR v0.8 released

@xin3he xin3he requested a review from thuang6 October 20, 2025 07:32
Copy link
Contributor

@thuang6 thuang6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

merge can wait binary published

@XuehaoSun XuehaoSun merged commit 4151ffe into master Oct 24, 2025
11 checks passed
@XuehaoSun XuehaoSun deleted the xinhe/nvfp4 branch October 24, 2025 01:22
XuehaoSun pushed a commit that referenced this pull request Oct 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants