Skip to content

Support RL online quantization with torchao#23014

Merged
vllm-bot merged 1 commit intovllm-project:mainfrom
jerryzh168:torchao-on-the-fly-quant
Oct 1, 2025
Merged

Support RL online quantization with torchao#23014
vllm-bot merged 1 commit intovllm-project:mainfrom
jerryzh168:torchao-on-the-fly-quant

Conversation

@jerryzh168
Copy link
Contributor

@jerryzh168 jerryzh168 commented Aug 15, 2025

Summary:
This is to enable online quant for verl. The PR
added support for initializing a TorchAOConfig object in vllm
through a serialized json file that specifies the type of quantization
people want. Or a json serialized TorchAOConfig object

Code for serializing the config to json:

from torchao.quantization import Float8DynamicActivationFloat8WeightConfig, PerRow
from torchao.core.config import config_to_dict
import json

config = Float8DynamicActivationFloat8WeightConfig(granularity=PerRow())

json_str = json.dumps(config_to_dict(config))

LLM(..., quantization="torchao", hf_overrides={"quantization_config_dict_json": json_str})

Code for serializing the config to file

from torchao.quantization import Float8DynamicActivationFloat8WeightConfig, PerRow
from torchao.core.config import config_to_dict
import json

config = Float8DynamicActivationFloat8WeightConfig(granularity=PerRow())

with open("torchao_config.json", "w") as f:
    f.write(json.dumps(config_to_dict(config)))

LLM(..., quantization="torchao", hf_overrides={"quantization_config_file": "torchao_config.json"})

This also supports module level config as well through the ModuleFqnToConfig config
https://huggingface.co/docs/transformers/main/en/quantization/torchao#per-module-quantization
although not tested yet.

more configs: https://docs.pytorch.org/ao/main/api_ref_quantization.html#inference-apis-for-quantize

Note: this has incorporated changes from @LiyuanLucasLiu's PR: #23901, although vllm fp8 quant method is not supported yet, we can add that in a separate PR

Test Plan:
pytest tests/quantization/test_torchao.py -k test_on_the_fly_quant
pytest tests/quantization/test_torchao.py -k test_reload_weights

and regression tests
pytest tests/quantization/test_torchao.py

Reviewers:

Subscribers:

Tasks:

Tags:

Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants