feat(quantization): add ActivationRestrictedAsymmetric option by Rishi-Dave · Pull Request #28237 · microsoft/onnxruntime

Rishi-Dave · 2026-04-27T11:28:08Z

Description

Adds a new ActivationRestrictedAsymmetric extra-option to the Python
quantization tools. When enabled, uint8 activation zero-points are snapped
to either 0 (when rmin >= 0, e.g. post-ReLU/Sigmoid tensors) or 128
(when rmin < 0). The scale is recomputed so the dequantized range still
covers [rmin, rmax] without clipping.

This restricted asymmetric mode is required by some hardware accelerators
that only support these two zero-point values for uint8 quantization,
without requiring the full restriction to symmetric (zero-point = 128 for
all tensors).

Motivation and Context

Fixes #21398.

Existing options cover only fully symmetric (ActivationSymmetric →
zero-point fixed at 128) or unrestricted asymmetric. There was no mode
that picks the closer of {0, 128} per tensor based on its observed range.

Changes

quant_utils.py: new snap_zero_point_to_uint8(rmin, rmax) helper.
base_quantizer.py: parse new ActivationRestrictedAsymmetric extra-option.
onnx_quantizer.py and qdq_quantizer.py: apply snap after
compute_scale_zp in the activation path. Guarded on
quant_type == UINT8 and not symmetric. Weight and int8 paths are
untouched.
quantize.py: document the new option in the four extra_options
docstrings.
test_symmetric_flag.py: new TestRestrictedAsymmetricFlag covering
three cases (positive range → zp=0, signed range → zp=128, and
option-disabled regression).

Testing

```
python -m pytest onnxruntime/test/python/quantization/test_symmetric_flag.py -v
```
All 7 tests pass (4 existing + 3 new). `lintrunner` is clean.

…t8 zero-point snapping When extra_options={"ActivationRestrictedAsymmetric": True} is passed to quantize_static (or a QDQ config), uint8 activation zero-points are snapped to 0 when rmin >= 0 (e.g. post-ReLU tensors) or 128 when rmin < 0. Scale is recomputed so the dequantized range still covers [rmin, rmax] without clipping. - quant_utils: add snap_zero_point_to_uint8() helper (~28 LOC) - base_quantizer: parse ActivationRestrictedAsymmetric extra-option flag - onnx_quantizer: apply snap after compute_scale_zp in calc_quant_params (uint8, non-symmetric activations only) - qdq_quantizer: same snap in QDQ calc_quant_params path - quantize: document new option in all four extra_options docstrings - test_symmetric_flag: add TestRestrictedAsymmetricFlag (3 test methods) Refs microsoft#21398

tianleiwu

Thanks for the focused change. The new option is consistently wired through the QOperator and QDQ paths, and the basic snap behavior is covered. I found one correctness issue that should be fixed before merge: the restricted asymmetric path recomputes scale after the existing quant-param helper and drops the MinimumRealRange guarantee. I also left a smaller test-discovery note.

Copilot

Pull request overview

Adds a new Python quantization extra_options mode (ActivationRestrictedAsymmetric) to support uint8 activation zero-points restricted to {0, 128}, as required by some accelerators.

Changes:

Add snap_zero_point_to_uint8(rmin, rmax) helper to recompute (zp, scale) with zp snapped to 0 or 128.
Parse/propagate the new ActivationRestrictedAsymmetric option and apply it in both QOperator and QDQ quantization activation parameter calculation.
Document the option in quantize.py and add unit tests covering the expected snapping behavior.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
onnxruntime/python/tools/quantization/quant_utils.py	Adds snapping helper for restricted-asymmetric uint8 activations.
onnxruntime/python/tools/quantization/base_quantizer.py	Parses new `ActivationRestrictedAsymmetric` extra option.
onnxruntime/python/tools/quantization/onnx_quantizer.py	Applies snapping in QOperator activation quant-param calculation.
onnxruntime/python/tools/quantization/qdq_quantizer.py	Applies snapping in QDQ activation quant-param calculation.
onnxruntime/python/tools/quantization/quantize.py	Documents the new extra-option in public docstrings.
onnxruntime/test/python/quantization/test_symmetric_flag.py	Adds tests validating zp snapping behavior for positive/signed activation ranges.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…c snap Address review feedback on PR microsoft#28237: - snap_zero_point_to_uint8 now accepts qmin/qmax and min_real_range, so the helper preserves the MinimumRealRange floor (matching compute_scale_zp behavior) and handles reduce_range=True correctly. The midpoint and scale formulas are derived from qmin/qmax instead of hardcoded UINT8 constants. - Both call sites in onnx_quantizer.py and qdq_quantizer.py now pass qmin, qmax, and self.min_real_range into the helper. - Move the unittest.main() guard to the end of test_symmetric_flag.py so TestRestrictedAsymmetricFlag is discovered when the file is run directly with python test_symmetric_flag.py.

…int_to_uint8 snap_zero_point_to_uint8 hardcoded uint8-asymmetric bounds (0/255/128/127) and returned scale=1.0 on degenerate ranges, which discarded any reduce_range or MinimumRealRange settings already applied by the caller. - Parameterize the helper on qmin, qmax, min_real_range. Default arg values reproduce the prior 0/255 math exactly. - Compute the snap pivot as mid = (qmin + qmax + 1) // 2 instead of hardcoding 128, so reduce_range (qmax=127) yields a valid in-range zp. - In the degenerate (rmax <= rmin) branch, derive scale from max(|rmin|, |rmax|) instead of returning 1.0; honor the min_real_range floor when provided. - Forward qmin, qmax, and self.min_real_range from both call sites in onnx_quantizer.py and qdq_quantizer.py to keep ActivationRestrictedAsymmetric consistent with compute_scale_zp. - Add tests for the reduce_range and min_real_range paths.

Rishi-Dave · 2026-05-03T12:43:19Z

Thanks for the careful review. Pushed d184733 addressing the points:

snap_zero_point_to_uint8 now takes qmin, qmax, and min_real_range. The snap pivot is computed as mid = (qmin + qmax + 1) // 2 (still 128 for the default 0/255 path), and the negative-rmin / non-negative-rmin branches use mid - qmin and qmax - mid as denominators. Defaults preserve the prior numerics exactly.
Degenerate rmax <= rmin no longer returns scale=1.0. It derives a meaningful scale from max(|rmin|, |rmax|) and applies the min_real_range floor when provided.
Both call sites in onnx_quantizer.py and qdq_quantizer.py now forward qmin, qmax, and self.min_real_range, so per-tensor reduce_range and MinimumRealRange flow through correctly when ActivationRestrictedAsymmetric is enabled.
Added two tests covering the reduce_range (qmax=127) and min_real_range floor paths.

__main__ block was already at the end of the file; left as-is.

tianleiwu

Follow-up review (round 2)

All high-priority concerns from the previous round (3969e7c) have been addressed:

snap_zero_point_to_uint8 now accepts qmin, qmax, and min_real_range parameters, properly honoring reduce_range and MinimumRealRange settings.
Both QOperator and QDQ call sites forward these parameters.
The test class is correctly placed before the if __name__ guard.
Degenerate range handling is improved.

All 7 previous threads have been resolved. Two minor suggestions remain (see inline comments), neither blocking.

tianleiwu · 2026-05-03T15:48:53Z

        # the symmetry (i.e., signed integer types will use symmetric quantization). See `def is_weight_symmetric()`
        self._is_weight_symmetric: bool | None = self.extra_options.get("WeightSymmetric", None)
        self.is_activation_symmetric = self.extra_options.get("ActivationSymmetric", False)
+        self.is_activation_restricted_asymmetric = self.extra_options.get("ActivationRestrictedAsymmetric", False)


Suggestion (non-blocking): if a user sets both ActivationSymmetric=True and ActivationRestrictedAsymmetric=True, the restricted path silently does nothing because symmetric=True fails the not symmetric guard in both quantizers. This is almost certainly a misconfiguration. Consider logging a warning here when both flags are enabled:

if self.is_activation_symmetric and self.is_activation_restricted_asymmetric: logger.warning("ActivationSymmetric and ActivationRestrictedAsymmetric are mutually exclusive; " "ActivationRestrictedAsymmetric will be ignored.")

Copilot

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…metric Three fixes addressing copilot-pull-request-reviewer feedback on PR microsoft#28237: 1. snap_zero_point_to_uint8 (quant_utils.py): the degenerate branch (rmax <= rmin) unconditionally returned mid as the zero-point, violating the restricted-asymmetric contract for constant all-zero or all-positive tensors. Apply the same rmin>=0 guard as the normal path: use qmin when rmin>=0, mid otherwise. 2. Docstrings (quantize.py): all four ActivationRestrictedAsymmetric docstring entries said "snap zero-point to 0 (rmin>=0) or 128 (rmin<0)", which is only correct for full-range uint8. Corrected to "snap to qmin (when rmin>=0) or the midpoint of the quantized range [qmin, qmax] (when rmin<0)" so the wording holds for reduce_range=True as well. 3. Regression test (test_symmetric_flag.py): add test_all_zero_activations_zp_is_qmin which feeds an all-zero calibration tensor through the ActivationRestrictedAsymmetric quantize_static path and asserts zp==0 (qmin), not 128.

Rishi-Dave · 2026-05-04T11:23:32Z

Thanks for the review. Pushed b58a58a addressing all three points:

quant_utils.py: in snap_zero_point_to_uint8, the degenerate rmax <= rmin branch now snaps zp to qmin when rmin >= 0 (matching the normal path) instead of unconditionally returning the midpoint. This restores the contract for all-zero / all-positive constant tensors so they get zp=0, not 128.
quantize.py: updated all four ActivationRestrictedAsymmetric docstrings to say "midpoint of the quantized range [qmin, qmax]" instead of the hardcoded "128", since under reduce_range=True (qmax=127) the midpoint is 64.
test_symmetric_flag.py: added test_all_zero_activations_zp_is_qmin covering the rmin==rmax==0 case and asserting zp==0. Existing 9 tests still pass; lintrunner clean.

Copilot

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

- Expand calibrated range to include zero before applying min_real_range, matching compute_scale_zp ordering so MinimumRealRange behaves consistently across quantization modes. - In the degenerate-range branch, select the scale denominator based on the snapped zero point: use the full quantized range when zp == qmin (all-positive path) and the half range for the midpoint snap, keeping precision consistent with the non-degenerate path. - Refactor restricted-asymmetric tests to write models inside a tempfile.TemporaryDirectory() to avoid CWD collisions under parallel test execution.

Rishi-Dave · 2026-05-05T11:39:09Z

Thanks for the catch on the latest review — addressed in 53039cb:

snap_zero_point_to_uint8 now expands [rmin, rmax] to include zero before applying the min_real_range floor, matching the ordering in compute_scale_zp so the floor behaves identically for all-positive tensors.
In the degenerate-range branch, the scale denominator is now conditional: full (qmax - qmin) when degenerate_zp == qmin (all-positive path, mirrors the non-degenerate rmin >= 0 scale), and (qmax - qmin) // 2 only when degenerate_zp == mid (symmetric snap).
test_symmetric_flag.py writes the intermediate ONNX models inside a tempfile.TemporaryDirectory() instead of CWD, in line with the other tests in onnxruntime/test/python/quantization/.

lintrunner -a is clean. PTAL.

tianleiwu

Review round 3 (53039cb)

All high-priority concerns from previous rounds have been addressed:

snap_zero_point_to_uint8 now correctly accepts qmin, qmax, and min_real_range parameters, matching compute_scale_zp semantics (zero-inclusion before min_real_range expansion).
The degenerate-range branch correctly uses full range for positive-snap (zp=qmin) and half range for mid-snap, consistent with the non-degenerate paths.
Tests use tempfile.TemporaryDirectory() and the class is placed before the if __name__ guard.
Docstrings correctly describe "midpoint of the quantized range" rather than hardcoding 128.

The implementation is well-structured: the snap logic is cleanly factored into a single utility function, both QOperator and QDQ paths are consistently wired, and the guard conditions (quant_type == UINT8 and not symmetric) provide correct defense against accidental application to weights or symmetric activations.

One non-blocking suggestion from round 2 remains open (mutual exclusivity warning when both ActivationSymmetric and ActivationRestrictedAsymmetric are True). It is recommended but not a blocker for merge.

LGTM.

tianleiwu

Approving — all concerns from previous rounds have been resolved. See the COMMENTED review above for full details.

tianleiwu requested a review from Copilot May 3, 2026 00:25

Copilot started reviewing on behalf of tianleiwu May 3, 2026 00:27 View session

tianleiwu requested changes May 3, 2026

View reviewed changes

Comment thread onnxruntime/python/tools/quantization/quant_utils.py Outdated

Comment thread onnxruntime/test/python/quantization/test_symmetric_flag.py

Copilot AI reviewed May 3, 2026

View reviewed changes

Rishi-Dave added 2 commits May 3, 2026 11:20

tianleiwu reviewed May 3, 2026

View reviewed changes

tianleiwu requested a review from Copilot May 3, 2026 17:22

Copilot started reviewing on behalf of tianleiwu May 3, 2026 17:23 View session

Copilot AI reviewed May 3, 2026

View reviewed changes

Comment thread onnxruntime/python/tools/quantization/quant_utils.py Outdated

Comment thread onnxruntime/python/tools/quantization/quantize.py Outdated

Comment thread onnxruntime/test/python/quantization/test_symmetric_flag.py

tianleiwu requested a review from Copilot May 5, 2026 01:06

Copilot AI reviewed May 5, 2026

View reviewed changes

Comment thread onnxruntime/python/tools/quantization/quant_utils.py Outdated

Comment thread onnxruntime/python/tools/quantization/quant_utils.py Outdated

Comment thread onnxruntime/test/python/quantization/test_symmetric_flag.py Outdated

tianleiwu reviewed May 7, 2026

View reviewed changes

Comment thread onnxruntime/contrib_ops/cpu/bert/rotary_embedding_helper.h Outdated

Comment thread onnxruntime/core/providers/cpu/llm/rotary_embedding.cc Outdated

tianleiwu approved these changes May 7, 2026

View reviewed changes

tianleiwu enabled auto-merge (squash) May 10, 2026 17:27

tianleiwu merged commit 335a217 into microsoft:main May 11, 2026
103 of 111 checks passed

BrewTestBot mentioned this pull request Jun 19, 2026

onnxruntime 1.27.0 Homebrew/homebrew-core#288892

Merged

This was referenced Jun 29, 2026

Bump the onnx-runtime group with 1 update HeliosophLLC/DatumV#44

Open

deps(nuget): Bump the nuget-all group with 12 updates msbrettorg/maenifold#88

Open

Uh oh!

Conversation

Rishi-Dave commented Apr 27, 2026

Description

Motivation and Context

Changes

Testing

Uh oh!

tianleiwu left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Rishi-Dave commented May 3, 2026

Uh oh!

tianleiwu left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

tianleiwu May 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Rishi-Dave commented May 4, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Rishi-Dave commented May 5, 2026

Uh oh!

tianleiwu left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

tianleiwu left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

tianleiwu left a comment •

edited

Loading