Skip to content

update doc for online fp8 quantization#37851

Merged
Isotr0py merged 1 commit intovllm-project:mainfrom
yma11:fp8-doc
Mar 23, 2026
Merged

update doc for online fp8 quantization#37851
Isotr0py merged 1 commit intovllm-project:mainfrom
yma11:fp8-doc

Conversation

@yma11
Copy link
Copy Markdown
Contributor

@yma11 yma11 commented Mar 23, 2026

Purpose

No need memory to hold original model weights after using meta device in PR #31914.

Signed-off-by: Yan Ma <yan.ma@intel.com>
@mergify
Copy link
Copy Markdown
Contributor

mergify bot commented Mar 23, 2026

Documentation preview: https://vllm--37851.org.readthedocs.build/en/37851/

@mergify mergify bot added the documentation Improvements or additions to documentation label Mar 23, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the documentation for online FP8 quantization by removing a warning about memory requirements. The warning stated that the entire model needs to be loaded in its original precision, which is no longer true due to the implementation of loading weights onto a meta device and quantizing them on the fly. This change accurately reflects the current state of the feature.

@yma11
Copy link
Copy Markdown
Contributor Author

yma11 commented Mar 23, 2026

@mgoin @Isotr0py please take a look.

@Isotr0py Isotr0py enabled auto-merge (squash) March 23, 2026 05:17
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Mar 23, 2026
@Isotr0py Isotr0py merged commit d3fe857 into vllm-project:main Mar 23, 2026
11 checks passed
RhizoNymph pushed a commit to RhizoNymph/vllm that referenced this pull request Mar 26, 2026
Signed-off-by: Yan Ma <yan.ma@intel.com>
SouthWest7 pushed a commit to SouthWest7/vllm that referenced this pull request Mar 27, 2026
Signed-off-by: Yan Ma <yan.ma@intel.com>
khairulkabir1661 pushed a commit to khairulkabir1661/vllm that referenced this pull request Mar 27, 2026
Signed-off-by: Yan Ma <yan.ma@intel.com>
nithinvc pushed a commit to nithinvc/vllm that referenced this pull request Mar 27, 2026
Signed-off-by: Yan Ma <yan.ma@intel.com>

Signed-off-by: Nithin Chalapathi <nithin.ch10@gmail.com>
JiantaoXu pushed a commit to JiantaoXu/vllm that referenced this pull request Mar 28, 2026
Signed-off-by: Yan Ma <yan.ma@intel.com>
mtparet pushed a commit to blackfuel-ai/vllm that referenced this pull request Apr 9, 2026
Signed-off-by: Yan Ma <yan.ma@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants