Skip to content

ORT 1.25.1 release: version bump and cherry-pick #27907#28148

Closed
vraspar wants to merge 2 commits intomicrosoft:rel-1.25.1from
vraspar:vraspar/bump-version-1.25.1
Closed

ORT 1.25.1 release: version bump and cherry-pick #27907#28148
vraspar wants to merge 2 commits intomicrosoft:rel-1.25.1from
vraspar:vraspar/bump-version-1.25.1

Conversation

@vraspar
Copy link
Copy Markdown
Contributor

@vraspar vraspar commented Apr 21, 2026

Version bump to 1.25.1 and cherry-pick of #27907 (Add LinearAttention and CausalConvState ops for Qwen3.5) for the 1.25.1 patch release.

Cherry-pick merge commit: 0fedb26

vraspar and others added 2 commits April 21, 2026 02:24
)

Adds custom CUDA and CPU kernels for linear attention and causal 1D
convolution with state, enabling efficient inference of Qwen3.5 hybrid
decoder models in ONNX Runtime.

### New Operators

**`LinearAttention`** — Implements the GatedDeltaNet recurrent linear
attention mechanism:
- Fused kernel computing gated delta-rule update of a recurrent state
matrix
- Supports both prefill (multi-token) and decode (single-token) paths
- Inputs: Q, K, V, decay (alpha), beta gating, optional initial
recurrent state
- Outputs: attention output, updated recurrent state
- CUDA implementation with per-head parallelism; CPU implementation with
Eigen

**`CausalConvWithState`** — Implements causal 1D convolution with
persistent state for autoregressive decoding:
- Supports prefill (full convolution) and decode (state-based sliding
window)
- Inputs: input tensor, conv weights, optional bias, optional initial
conv state
- Outputs: convolution output, updated conv state

### Op Definitions
- Registered in `com.microsoft` domain (opset 1)
- Full shape inference and type constraints in `bert_defs.cc`

### Testing
- Parity test (`test_parity_linear_attention_causal_conv.py`) validates
CUDA and CPU kernels against PyTorch reference implementations from the
FLA (Flash Linear Attention) library

---------

Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
@vraspar
Copy link
Copy Markdown
Contributor Author

vraspar commented Apr 21, 2026

Closing - recreating with branch from microsoft/onnxruntime instead of fork

@vraspar vraspar closed this Apr 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants