Skip to content

Conversation

@javier-intel
Copy link

@javier-intel javier-intel commented Jun 16, 2025

Description

Adding pass to propagate scale values with a magnitude above a certain threshold to avoid numerical overflows.
https://jira.devtools.intel.com/browse/CVS-170179

Motivation and Context

Improve precision on certain networks

@javier-intel javier-intel force-pushed the jemartin/scale_propagation branch from 3d0ca12 to 4cb9374 Compare June 17, 2025 15:59
} else if (session_context_.device_type.find("GPU") != std::string::npos) {
// Create a copy of the model
std::unique_ptr<onnxruntime::Model> model;
Status status = qdq_scales_fix::Transform(subgraph, logger, model);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this pass happening even for non quantized models?

Copy link

@mklimenk mklimenk Jul 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@preetha-intel, this pass is happening only when the enable_qdq_optimizer flag is set.
Inside the pass it specifically looks for quantized blocks with (u)int16 precision and ignores everything else. So the regular models are not affected by it, even if the flag was passed by accident

@javier-intel javier-intel requested a review from MayureshV1 June 24, 2025 16:37
@harihs1729
Copy link

Accuracy results from PSD model testing on GPU align with both NPU outputs and Microsoft’s expectations. Please proceed with merging this PR.

@javier-intel javier-intel force-pushed the jemartin/scale_propagation branch from 0281273 to e0cc75c Compare July 3, 2025 04:08
@ankitm3k ankitm3k requested a review from Copilot July 3, 2025 09:45
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Adds a new pass to propagate and adjust quantization scales in QDQ (Quantize–Dequantize) pairs to avoid numerical overflow on large scale values. Key changes include:

  • Introduction of the qdq_scales_fix transformation pass (header, implementation, and protobuf utilities)
  • Invocation of the new scale‐propagation pass in backend_manager.cc for GPU when the QDQ optimizer is enabled
  • Build updates to link the ONNX protobuf definitions (onnx_proto) into the OpenVINO provider

Reviewed Changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated no comments.

Show a summary per file
File Description
onnxruntime/core/providers/openvino/qdq_transformations/qdq_scales_fix.h Declares the new Transform pass interface
onnxruntime/core/providers/openvino/qdq_transformations/qdq_scales_fix.cpp Implements graph construction, scale propagation, and removal of QDQ pairs
onnxruntime/core/providers/openvino/ov_protobuf_utils.h Adds helpers to get/set float data in protobuf tensors
onnxruntime/core/providers/openvino/ov_protobuf_utils.cpp Defines get_float_initializer_data and set_float_initializer_data
onnxruntime/core/providers/openvino/backend_manager.cc Calls the new scale‐fix pass for GPU under the QDQ optimizer
cmake/onnxruntime_providers_openvino.cmake Links onnx_proto to the OpenVINO provider target
onnxruntime/core/optimizer/double_qdq_pairs_remover.cc Fixes missing dimension in newly created initializer
Comments suppressed due to low confidence (4)

onnxruntime/core/providers/openvino/backend_manager.cc:433

  • The modified branching drops the so_share_ep_contexts condition for GPU in the first branch and only checks enable_ovep_qdq_optimizer in the new GPU branch. Please verify that GPU behavior when so_share_ep_contexts is true but the optimizer flag is false is still correct.
  if ((session_context_.device_type.find("NPU") != std::string::npos) &&

onnxruntime/core/providers/openvino/qdq_transformations/qdq_scales_fix.h:14

  • [nitpick] Add unit tests for the new Transform pass to cover scenarios with varying thresholds, different network topologies, and multiple QDQ patterns to guard against regressions.
Status Transform(const GraphViewer& src_graph,

onnxruntime/core/providers/openvino/qdq_transformations/qdq_scales_fix.cpp:14

  • The code uses std::format in ToString(), but is not included. Add #include <format> to ensure compilation succeeds.
#include <algorithm>

onnxruntime/core/providers/openvino/qdq_transformations/qdq_scales_fix.cpp:67

  • [nitpick] Remove these commented-out placeholder lines (//** node_input_name = [], //** node_output_name = []) to clean up dead code and improve readability.
      //** node_input_name = []

Copy link

@sfatimar sfatimar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is being merged only because of urgency.. It is not properly code reviewed so may have some issues.

@sfatimar sfatimar merged commit e2ec2b3 into ovep-develop Jul 3, 2025
6 of 8 checks passed
ankitm3k added a commit that referenced this pull request Jul 9, 2025
ankitm3k pushed a commit that referenced this pull request Jul 17, 2025
* Add pass to perform QDQ stripping and propagate scales

* Fix disconnected outptu node

* Fixes to support session.disable_quant_qdq output, remove dangling nodes and duplicate DQ nodes

* Fix lack of scales updates and remove stray QDQ nodes in certain models

* Address issues with Linux CI

* Fix for double QDQ issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants