update doc for online fp8 quantization by yma11 · Pull Request #37851 · vllm-project/vllm

yma11 · 2026-03-23T04:47:06Z

Purpose

No need memory to hold original model weights after using meta device in PR #31914.

Signed-off-by: Yan Ma <yan.ma@intel.com>

mergify · 2026-03-23T04:47:42Z

Documentation preview: https://vllm--37851.org.readthedocs.build/en/37851/

gemini-code-assist

Code Review

This pull request updates the documentation for online FP8 quantization by removing a warning about memory requirements. The warning stated that the entire model needs to be loaded in its original precision, which is no longer true due to the implementation of loading weights onto a meta device and quantizing them on the fly. This change accurately reflects the current state of the feature.

yma11 · 2026-03-23T04:48:52Z

@mgoin @Isotr0py please take a look.

Signed-off-by: Yan Ma <yan.ma@intel.com>

Signed-off-by: Yan Ma <yan.ma@intel.com> Signed-off-by: Nithin Chalapathi <nithin.ch10@gmail.com>

Signed-off-by: Yan Ma <yan.ma@intel.com>

update doc for online fp8 quantization

0ce4d44

Signed-off-by: Yan Ma <yan.ma@intel.com>

mergify bot added the documentation Improvements or additions to documentation label Mar 23, 2026

gemini-code-assist bot reviewed Mar 23, 2026

View reviewed changes

Isotr0py approved these changes Mar 23, 2026

View reviewed changes

Isotr0py enabled auto-merge (squash) March 23, 2026 05:17

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Mar 23, 2026

Isotr0py merged commit d3fe857 into vllm-project:main Mar 23, 2026
11 checks passed

RhizoNymph pushed a commit to RhizoNymph/vllm that referenced this pull request Mar 26, 2026

update doc for online fp8 quantization (vllm-project#37851)

adc09e1

Signed-off-by: Yan Ma <yan.ma@intel.com>

SouthWest7 pushed a commit to SouthWest7/vllm that referenced this pull request Mar 27, 2026

update doc for online fp8 quantization (vllm-project#37851)

5c283ba

Signed-off-by: Yan Ma <yan.ma@intel.com>

khairulkabir1661 pushed a commit to khairulkabir1661/vllm that referenced this pull request Mar 27, 2026

update doc for online fp8 quantization (vllm-project#37851)

bff9926

Signed-off-by: Yan Ma <yan.ma@intel.com>

nithinvc pushed a commit to nithinvc/vllm that referenced this pull request Mar 27, 2026

update doc for online fp8 quantization (vllm-project#37851)

d716f57

Signed-off-by: Yan Ma <yan.ma@intel.com> Signed-off-by: Nithin Chalapathi <nithin.ch10@gmail.com>

JiantaoXu pushed a commit to JiantaoXu/vllm that referenced this pull request Mar 28, 2026

update doc for online fp8 quantization (vllm-project#37851)

ee19955

Signed-off-by: Yan Ma <yan.ma@intel.com>

mtparet pushed a commit to blackfuel-ai/vllm that referenced this pull request Apr 9, 2026

update doc for online fp8 quantization (vllm-project#37851)

28e561e

Signed-off-by: Yan Ma <yan.ma@intel.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

update doc for online fp8 quantization#37851

update doc for online fp8 quantization#37851
Isotr0py merged 1 commit intovllm-project:mainfrom
yma11:fp8-doc

yma11 commented Mar 23, 2026 •

edited by github-actions bot

Loading

Uh oh!

mergify bot commented Mar 23, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

yma11 commented Mar 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

yma11 commented Mar 23, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Uh oh!

mergify bot commented Mar 23, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

yma11 commented Mar 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

yma11 commented Mar 23, 2026 •

edited by github-actions bot

Loading