Add SkipLayerNorm fusion with bias Add#27765
Conversation
Co-authored-by: kunal-vaishnavi <115581922+kunal-vaishnavi@users.noreply.github.com>
…dering bug Co-authored-by: kunal-vaishnavi <115581922+kunal-vaishnavi@users.noreply.github.com>
Co-authored-by: kunal-vaishnavi <115581922+kunal-vaishnavi@users.noreply.github.com>
Co-authored-by: kunal-vaishnavi <115581922+kunal-vaishnavi@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR adds a new Level2 graph optimization pass that fuses an upstream bias Add into an existing com.microsoft::SkipLayerNormalization node by converting it from 4 inputs to 5 inputs (adding the bias input), and wires it into the standard optimizer pipeline for multiple execution providers.
Changes:
- Introduces
BiasSkipLayerNormFusiontransformer to rewriteAdd(MatMul[, Cast], bias_1D) -> SkipLayerNormalizationinto a single 5-inputSkipLayerNormalization. - Registers the new fusion (and broadens
SkipLayerNormFusionEP compatibility) inGenerateTransformers. - Adds unit tests covering positive and negative fusion cases for the new transformer.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| onnxruntime/test/optimizer/graph_transform_test_layernorm.cc | Adds unit tests validating the new bias + SkipLayerNorm fusion behavior. |
| onnxruntime/core/optimizer/graph_transformer_utils.cc | Registers BiasSkipLayerNormFusion in the Level2 transformer pipeline and extends compatible EP sets. |
| onnxruntime/core/optimizer/bias_skip_layer_norm_fusion.h | Declares the new BiasSkipLayerNormFusion transformer. |
| onnxruntime/core/optimizer/bias_skip_layer_norm_fusion.cc | Implements pattern matching and graph rewrite to absorb a constant 1D bias Add into SkipLayerNormalization. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR adds a new graph optimization pass that fuses an upstream bias Add into an existing com.microsoft.SkipLayerNormalization node (moving from 4 inputs to 5 inputs), and wires it into the Level2 transformer pipeline (including more execution providers), with accompanying unit tests.
Changes:
- Introduces
BiasSkipLayerNormFusiontransformer to absorbAdd(+1D bias)(optionally viaCast) intoSkipLayerNormalization. - Registers the new transformer in the Level2 optimization pipeline and expands SLN fusion EP compatibility to include JS/WebGPU.
- Adds targeted unit tests covering positive and negative fusion cases.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| onnxruntime/test/optimizer/graph_transform_test_layernorm.cc | Adds unit tests validating Bias+SLN fusion behavior (and non-fusion conditions). |
| onnxruntime/core/optimizer/graph_transformer_utils.cc | Registers the new transformer and updates compatible EP sets for SLN-related fusions. |
| onnxruntime/core/optimizer/bias_skip_layer_norm_fusion.h | Declares the BiasSkipLayerNormFusion graph transformer. |
| onnxruntime/core/optimizer/bias_skip_layer_norm_fusion.cc | Implements the Add(+bias) → SLN(5 inputs) fusion logic (with optional Cast handling). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Adds a new optimizer pass to fuse Add(+1D bias) + com.microsoft::SkipLayerNormalization(4 inputs) into a single SkipLayerNormalization node with a 5th bias input, and wires it into the Level2 transformer pipeline for additional EPs.
Changes:
- Introduces
BiasSkipLayerNormFusiongraph transformer to absorb an upstream biasAdd(optionally viaCast) intoSkipLayerNormalization. - Registers the new fusion pass (and broadens EP allowlist for SkipLayerNorm fusion) in the transformer generation pipeline.
- Adds unit tests covering positive and negative fusion cases.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| onnxruntime/test/optimizer/graph_transform_test_layernorm.cc | Adds unit tests validating BiasSkipLayerNormFusion behavior (fuse + no-fuse scenarios). |
| onnxruntime/core/optimizer/graph_transformer_utils.cc | Registers the new fusion pass and expands compatible EP list for SkipLayerNorm-related fusions. |
| onnxruntime/core/optimizer/bias_skip_layer_norm_fusion.h | Declares the new BiasSkipLayerNormFusion transformer and documents the before/after pattern. |
| onnxruntime/core/optimizer/bias_skip_layer_norm_fusion.cc | Implements pattern matching and node rewrite to create 5-input SkipLayerNormalization and remove obsolete nodes. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Adds a new graph optimization to fuse an upstream bias Add (optionally via Cast from MatMul) into an existing 4-input com.microsoft.SkipLayerNormalization, producing a single 5-input SkipLayerNormalization with bias.
Changes:
- Introduces
BiasSkipLayerNormFusiontransformer to absorb 1D constant bias adds intoSkipLayerNormalization. - Registers the new fusion in the Level2 transformer pipeline and expands compatible EP list for SkipLayerNorm fusions (CPU/ACL/CUDA/DML/JS/WebGPU).
- Adds targeted optimizer tests covering positive and negative fusion scenarios (including Cast and mismatch cases).
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| onnxruntime/test/optimizer/graph_transform_test_layernorm.cc | Adds unit tests validating bias+SLN fusion behavior and non-fusion guards |
| onnxruntime/core/optimizer/graph_transformer_utils.cc | Registers the new transformer and updates compatible EP sets |
| onnxruntime/core/optimizer/bias_skip_layer_norm_fusion.h | Declares the new BiasSkipLayerNormFusion transformer |
| onnxruntime/core/optimizer/bias_skip_layer_norm_fusion.cc | Implements pattern matching and graph rewrite to add bias as 5th SLN input |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Adds a new optimizer pass to fuse an upstream bias Add into an existing 4-input com.microsoft.SkipLayerNormalization, producing a 5-input SLN that directly consumes the bias (including a MatMul→Cast→Add variant), and wires it into the Level2 transformer pipeline with accompanying unit tests.
Changes:
- Introduced
BiasSkipLayerNormFusiongraph transformer to absorb 1D constant bias adds into SkipLayerNormalization. - Registered the new fusion (and expanded SLN fusion EP allowlist) in the transformer generation pipeline.
- Added multiple structural tests covering positive and negative fusion scenarios, including Cast and shape-mismatch cases.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
| onnxruntime/test/optimizer/graph_transform_test_layernorm.cc | Adds unit tests validating Bias+SLN fusion behavior and non-fusion guardrails |
| onnxruntime/core/optimizer/graph_transformer_utils.cc | Registers the new fusion pass and updates the EP compatibility set used for SLN fusions |
| onnxruntime/core/optimizer/bias_skip_layer_norm_fusion.h | Declares the new BiasSkipLayerNormFusion transformer |
| onnxruntime/core/optimizer/bias_skip_layer_norm_fusion.cc | Implements the fusion logic for Add(+optional Cast)+MatMul feeding SLN |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
…etween Path 1 and Path 2 Co-authored-by: kunal-vaishnavi <115581922+kunal-vaishnavi@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR adds a new optimizer pass to fuse Add(+1D bias) into an existing 4-input com.microsoft::SkipLayerNormalization, producing a single 5-input SkipLayerNormalization node (with bias as the 5th input), and wires the pass into the Level2 transformer pipeline for additional EPs.
Changes:
- Added
BiasSkipLayerNormFusiongraph transformer implementation (supports MatMul→Add or MatMul→Cast→Add patterns). - Registered the new fusion (and expanded SkipLayerNormFusion EP set) in
GenerateTransformers. - Added unit tests covering positive/negative fusion scenarios.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| onnxruntime/test/optimizer/graph_transform_test_layernorm.cc | Adds unit tests for BiasSkipLayerNormFusion patterns and non-fusion cases |
| onnxruntime/core/optimizer/graph_transformer_utils.cc | Registers BiasSkipLayerNormFusion in Level2 pipeline and expands compatible EP set |
| onnxruntime/core/optimizer/bias_skip_layer_norm_fusion.h | Declares new fusion transformer |
| onnxruntime/core/optimizer/bias_skip_layer_norm_fusion.cc | Implements the Add(+bias) → SkipLayerNormalization(5 inputs) rewrite |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…s, add downstream-consumer test Co-authored-by: kunal-vaishnavi <115581922+kunal-vaishnavi@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…ction error Co-authored-by: kunal-vaishnavi <115581922+kunal-vaishnavi@users.noreply.github.com>
…27: fix test comment Co-authored-by: kunal-vaishnavi <115581922+kunal-vaishnavi@users.noreply.github.com>
Co-authored-by: kunal-vaishnavi <115581922+kunal-vaishnavi@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
… bias{1} instead of invalid bias{3}
Co-authored-by: kunal-vaishnavi <115581922+kunal-vaishnavi@users.noreply.github.com>
There was a problem hiding this comment.
Review
1. Hidden-Size Validation (onnxruntime/core/optimizer/bias_skip_layer_norm_fusion.cc)
Positive:
- The fusion keeps
input[0]/input[1]stable and snapshots both outputs and outgoing edges before removal, which is exactly the right shape-preserving pattern forSkipLayerNormalization.
Concern:
⚠️ Can fuse an invalid bias whengamma/betashapes are unknown: the new guard only comparesbias_hidden_sizeagainst the hidden size derived fromgamma/beta. If those inputs are graph inputs without static 1-D shape info,get_sln_hidden_size()returns-1, so a broadcast-validbias{1}will still be absorbed. That is a behavior change:Add(x, bias{1})is legal, but the fused 5-inputSkipLayerNormalizationrequiresbias.shape[0] == hidden_sizeand will later failCheckBias()at runtime. In other words, the optimizer can turn a valid unfused graph into an invalid fused graph.int64_t hidden_size = get_sln_hidden_size(sln_node); if (hidden_size == -1) { hidden_size = GetLastDimValue(*candidate_add->MutableInputDefs()[add_matmul_input_idx]); } // If we still can't prove the bias matches the hidden size, bail out. if (hidden_size == -1 || bias_hidden_size == -1 || hidden_size != bias_hidden_size) { return false; }
2. Regression Coverage (onnxruntime/test/optimizer/graph_transform_test_layernorm.cc, onnxruntime/core/optimizer/graph_transformer_utils.cc)
Positive:
- The new tests cover the main wiring variants well: bias on either
Addinput,SkipLayerNormalizationinput 0 vs 1, the fp16Castpath, multi-consumer rejection, and downstream edge rewiring.
Concern:
⚠️ The missing negative case is exactly the one that breaks today: every new test uses statically-shaped initializergamma/beta, so the hidden-size mismatch guard is always informed by compile-time metadata. There is no negative test wheregamma/betaare dynamic inputs or otherwise lack shape info, and there is also no provider-tagged coverage for the new JS/WebGPU allowlist. A focused negative test forbias{1}with unknowngamma/betashapes would have caught the issue above immediately.
Summary of Concerns
| # | Severity | Component | Issue |
|---|---|---|---|
| 1 | High | bias_skip_layer_norm_fusion.cc |
Fusion can absorb a broadcast bias that 5-input SkipLayerNormalization cannot legally accept when gamma/beta shapes are unknown. |
| 2 | Suggestion | Tests / transformer registration | Missing dynamic-shape and provider-tagged coverage leaves the new guard and JS/WebGPU rollout under-tested. |
Verdict
REQUEST CHANGES — the hidden-size validation needs one more safety check before this fusion is safe on partially-shaped graphs.
…rNormFusion Co-authored-by: kunal-vaishnavi <115581922+kunal-vaishnavi@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Description
This pull request introduces a new graph optimization pass to fuse Add + SkipLayerNormalization subgraphs into a single SkipLayerNormalization node that incorporates a bias input. This helps simplify the computation graph, especially for models using bias after MatMul, and extends support for more execution providers. The main changes include the implementation of the new fusion, its integration into the optimizer pipeline, and updates to provider compatibility.
New Bias + SkipLayerNormalization Fusion:
BiasSkipLayerNormFusionclass and implementation to detect and fuse subgraphs where a 1D bias is added to a MatMul (optionally through a Cast) before SkipLayerNormalization, replacing them with a single node that absorbs the bias as a fifth input.Integration into Optimization Pipeline:
BiasSkipLayerNormFusionin the graph transformer utility, ensuring it runs after the standard SkipLayerNorm fusion and covers more execution providers (CPU, ACL, CUDA, DML, JS, WebGPU).Test and Include Updates:
Motivation and Context
These changes collectively improve model optimization by reducing node count and improving runtime efficiency for supported providers.
This PR also helps perform this fusion on many models inside the Foundry Local catalog without needing to re-deploy models.