Skip to content

Conversation

@quic-muchhsu
Copy link
Contributor

Description

Lower Gemm with 2d bias to FC + ElementwiseAdd when targeting HTP.

Motivation and Context

This change will allow Gemm with 2d bias stays on HTP and not falling back to CPU.

Signed-off-by: Mu-Chein Hsu <quic_muchhsu@quicinc.com>
@quic-muchhsu
Copy link
Contributor Author

@quic-muchhsu
Copy link
Contributor Author

@microsoft-github-policy-service agree company=Qualcomm

@quic-muchhsu quic-muchhsu changed the title Lower Gemm with 2d bias to FC + ElementwiseAdd when targeting HTP. [QNN EP] Lower Gemm with 2d bias to FC + ElementwiseAdd when targeting HTP. Jul 31, 2025
Signed-off-by: Mu-Chein Hsu <quic_muchhsu@quicinc.com>
@jywu-msft jywu-msft added the ep:QNN issues related to QNN exeution provider label Jul 31, 2025
@HectorSVC
Copy link
Contributor

/azp run Linux QNN CI Pipeline,Win_TRT_Minimal_CUDA_Test_CI,Windows ARM64 QNN CI Pipeline,Windows GPU Doc Gen CI Pipeline,Windows x64 QNN CI Pipeline

@azure-pipelines
Copy link

Azure Pipelines successfully started running 5 pipeline(s).

@HectorSVC HectorSVC requested a review from Copilot July 31, 2025 22:08
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR modifies the QNN execution provider to support Gemm operations with 2D bias matrices when targeting HTP (Hardware Transform Pipeline). Previously, these operations would fall back to CPU execution.

  • Updates Gemm operator support to handle 2D bias matrices by decomposing them into FullyConnected + ElementwiseAdd operations
  • Modifies validation logic to allow 2D bias with shape [N, M] for non-quantized inputs
  • Updates test cases to verify that Gemm with 2D bias is now assigned to QNN EP instead of falling back to CPU

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
gemm_op_test.cc Updates test cases to verify 2D bias support and changes expected EP assignment from None to All
gemm_op_builder.cc Implements logic to decompose Gemm with 2D bias into FullyConnected + ElementwiseAdd operations

@HectorSVC HectorSVC merged commit e57dc2a into microsoft:main Aug 1, 2025
86 checks passed
@qti-yuduo qti-yuduo deleted the dev/muchhsu/allow_fp_gemm_2d_bias_on_qnn branch August 1, 2025 17:10
sanketkaleoss pushed a commit to sanketkaleoss/onnxruntime that referenced this pull request Aug 11, 2025
…g HTP. (microsoft#25605)

### Description
Lower Gemm with 2d bias to FC + ElementwiseAdd when targeting HTP.

### Motivation and Context
This change will allow Gemm with 2d bias stays on HTP and not falling back to CPU.

---------

Signed-off-by: Mu-Chein Hsu <quic_muchhsu@quicinc.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ep:QNN issues related to QNN exeution provider

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants