-
Notifications
You must be signed in to change notification settings - Fork 56
Add QDQ scale propagation pass #713
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
3d0ca12 to
4cb9374
Compare
| } else if (session_context_.device_type.find("GPU") != std::string::npos) { | ||
| // Create a copy of the model | ||
| std::unique_ptr<onnxruntime::Model> model; | ||
| Status status = qdq_scales_fix::Transform(subgraph, logger, model); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this pass happening even for non quantized models?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@preetha-intel, this pass is happening only when the enable_qdq_optimizer flag is set.
Inside the pass it specifically looks for quantized blocks with (u)int16 precision and ignores everything else. So the regular models are not affected by it, even if the flag was passed by accident
|
Accuracy results from PSD model testing on GPU align with both NPU outputs and Microsoft’s expectations. Please proceed with merging this PR. |
…des and duplicate DQ nodes
0281273 to
e0cc75c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Adds a new pass to propagate and adjust quantization scales in QDQ (Quantize–Dequantize) pairs to avoid numerical overflow on large scale values. Key changes include:
- Introduction of the
qdq_scales_fixtransformation pass (header, implementation, and protobuf utilities) - Invocation of the new scale‐propagation pass in
backend_manager.ccfor GPU when the QDQ optimizer is enabled - Build updates to link the ONNX protobuf definitions (
onnx_proto) into the OpenVINO provider
Reviewed Changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| onnxruntime/core/providers/openvino/qdq_transformations/qdq_scales_fix.h | Declares the new Transform pass interface |
| onnxruntime/core/providers/openvino/qdq_transformations/qdq_scales_fix.cpp | Implements graph construction, scale propagation, and removal of QDQ pairs |
| onnxruntime/core/providers/openvino/ov_protobuf_utils.h | Adds helpers to get/set float data in protobuf tensors |
| onnxruntime/core/providers/openvino/ov_protobuf_utils.cpp | Defines get_float_initializer_data and set_float_initializer_data |
| onnxruntime/core/providers/openvino/backend_manager.cc | Calls the new scale‐fix pass for GPU under the QDQ optimizer |
| cmake/onnxruntime_providers_openvino.cmake | Links onnx_proto to the OpenVINO provider target |
| onnxruntime/core/optimizer/double_qdq_pairs_remover.cc | Fixes missing dimension in newly created initializer |
Comments suppressed due to low confidence (4)
onnxruntime/core/providers/openvino/backend_manager.cc:433
- The modified branching drops the
so_share_ep_contextscondition for GPU in the first branch and only checksenable_ovep_qdq_optimizerin the new GPU branch. Please verify that GPU behavior whenso_share_ep_contextsis true but the optimizer flag is false is still correct.
if ((session_context_.device_type.find("NPU") != std::string::npos) &&
onnxruntime/core/providers/openvino/qdq_transformations/qdq_scales_fix.h:14
- [nitpick] Add unit tests for the new
Transformpass to cover scenarios with varying thresholds, different network topologies, and multiple QDQ patterns to guard against regressions.
Status Transform(const GraphViewer& src_graph,
onnxruntime/core/providers/openvino/qdq_transformations/qdq_scales_fix.cpp:14
- The code uses std::format in ToString(), but is not included. Add
#include <format>to ensure compilation succeeds.
#include <algorithm>
onnxruntime/core/providers/openvino/qdq_transformations/qdq_scales_fix.cpp:67
- [nitpick] Remove these commented-out placeholder lines (
//** node_input_name = [],//** node_output_name = []) to clean up dead code and improve readability.
//** node_input_name = []
sfatimar
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR is being merged only because of urgency.. It is not properly code reviewed so may have some issues.
* Add pass to perform QDQ stripping and propagate scales * Fix disconnected outptu node * Fixes to support session.disable_quant_qdq output, remove dangling nodes and duplicate DQ nodes * Fix lack of scales updates and remove stray QDQ nodes in certain models * Address issues with Linux CI * Fix for double QDQ issue
Description
Adding pass to propagate scale values with a magnitude above a certain threshold to avoid numerical overflows.
https://jira.devtools.intel.com/browse/CVS-170179
Motivation and Context
Improve precision on certain networks