Support RL online quantization with torchao by jerryzh168 · Pull Request #23014 · vllm-project/vllm

jerryzh168 · 2025-08-15T23:53:32Z

Summary:
This is to enable online quant for verl. The PR
added support for initializing a TorchAOConfig object in vllm
through a serialized json file that specifies the type of quantization
people want. Or a json serialized TorchAOConfig object

Code for serializing the config to json:

from torchao.quantization import Float8DynamicActivationFloat8WeightConfig, PerRow
from torchao.core.config import config_to_dict
import json

config = Float8DynamicActivationFloat8WeightConfig(granularity=PerRow())

json_str = json.dumps(config_to_dict(config))

LLM(..., quantization="torchao", hf_overrides={"quantization_config_dict_json": json_str})

Code for serializing the config to file

from torchao.quantization import Float8DynamicActivationFloat8WeightConfig, PerRow
from torchao.core.config import config_to_dict
import json

config = Float8DynamicActivationFloat8WeightConfig(granularity=PerRow())

with open("torchao_config.json", "w") as f:
    f.write(json.dumps(config_to_dict(config)))

LLM(..., quantization="torchao", hf_overrides={"quantization_config_file": "torchao_config.json"})

This also supports module level config as well through the ModuleFqnToConfig config
https://huggingface.co/docs/transformers/main/en/quantization/torchao#per-module-quantization
although not tested yet.

more configs: https://docs.pytorch.org/ao/main/api_ref_quantization.html#inference-apis-for-quantize

Note: this has incorporated changes from @LiyuanLucasLiu's PR: #23901, although vllm fp8 quant method is not supported yet, we can add that in a separate PR

Test Plan:
pytest tests/quantization/test_torchao.py -k test_on_the_fly_quant
pytest tests/quantization/test_torchao.py -k test_reload_weights

and regression tests
pytest tests/quantization/test_torchao.py

Reviewers:

Subscribers:

Tasks:

Tags:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support RL online quantization with torchao#23014

Support RL online quantization with torchao#23014
vllm-bot merged 1 commit intovllm-project:mainfrom
jerryzh168:torchao-on-the-fly-quant

jerryzh168 commented Aug 15, 2025 •

edited by github-actions bot

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Uh oh!

Conversation

jerryzh168 commented Aug 15, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

jerryzh168 commented Aug 15, 2025 •

edited by github-actions bot

Loading