Modify scale & offset of WhereDummyDq#27109
Conversation
|
Hi @edgchen1, this fix is for a critical model. Could you help trigger the CI and review the PR? |
There was a problem hiding this comment.
Pull request overview
Updates the WhereDummyDq QDQ transformer to be more selective about when it inserts a dummy DequantizeLinear on Where scalar inputs, and adjusts the dummy quantization parameters to avoid non-positive scales (improving compatibility with EPs/backends).
Changes:
- Tighten
WhereDummyDq::SatisfyConditionto only apply whenWhereis immediately followed by a singleQuantizeLinear. - Update dummy (xq, zp, scale) construction so computed
scaleis positive by mapping the float scalar to integer-domain boundaries. - Fix the unit test to use the correct zero-point type for the downstream
QuantizeLinear.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
| onnxruntime/test/optimizer/qdq_transformer_test.cc | Adjusts WhereDummyDq test model to use ZpType for QuantizeLinear zero-point so types match the test’s quantized dtype. |
| onnxruntime/core/optimizer/qdq_transformer/where_dummy_dq.h | Expands class documentation to describe the dummy quantization parameter selection and scale computation. |
| onnxruntime/core/optimizer/qdq_transformer/where_dummy_dq.cc | Implements stricter pattern checks and new dummy scale/zp/xq selection to keep dummy scale positive. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
0f4e785 to
d8dc6a4
Compare
|
Hi @qti-hungjuiw, I think there are some merge conflicts with main. May have to sync your branch with latest main. Also, there are still a couple copilot review comments that are unresolved. Could you please take a look at those? It looks like some of those may be valid. Thank you for fixing this! |
d8dc6a4 to
5f9e4ee
Compare
- Update WhereDummyDq QDQ transformer to be more selective before
inserting a dummy DequantizeLinear around Where.
- SatisfyCondition now requires the Where output to have exactly one
consumer and that consumer must be QuantizeLinear (Q). Otherwise, the
transform is skipped.
- InsertDummyDQ additionally checks element type consistency between the
upstream DQ input tensor type and the downstream Q output tensor type;
if they differ, the transform returns without modifying the graph.
- Update the implementation of WhereDummyDq to avoid negative or zero
scale value. The change maps the float value to the boundary of integer
domain to ensure the scale value is positive.
- e.g. DQ uses uint8 but Q uses uint16
- Use HasTensorOrScalarShape to check the shape in SatisfyCondition - Check the return value of GetInitializedTensor in InsertDummyDQ
5f9e4ee to
2e5279b
Compare
Hi @adrianlizarraga, sure, I've resolved the conflict and addressed copilot's comments. Please let me know if there are any issues. |
|
/azp run Linux QNN CI Pipeline,Win_TRT_Minimal_CUDA_Test_CI,Windows ARM64 QNN CI Pipeline,Windows GPU Doc Gen CI Pipeline |
|
Azure Pipelines successfully started running 4 pipeline(s). |
Description
WhereDummyDqQDQ transformer to be more selective before inserting a dummyDequantizeLineararoundWhere.SatisfyConditionnow requires theWhereoutput to have exactly one consumer and that consumer must beQuantizeLinear(Q). Otherwise, the transform is skipped.InsertDummyDQadditionally checks element type consistency between the upstream DQ input tensor type and the downstream Q output tensor type; if they differ, the transform returns without modifying the graph.WhereDummyDqto avoid negative or zeroscalevalue. The change maps the float value to the boundary of integer domain to ensure thescalevalue is positive.If
WhereOpget a float scalarxfand aDequantizeLinearas its two inputs,WhereDummyDqinsert DQ to ensurexf = DQ(xq, scale, zp)The
xq,scaleandzpare determined with the following table.scale = xf / (xq - zp)ifxq != zpelse1Motivation and Context
Whereoutput is not followed by a singleQuantizeLinear(e.g., multiple consumers or a non-Q consumer), adding a dummy DQ cannot create the intended pattern and may lead to non-fusible/undesired graph structures.