ORT 1.25.1 release: version bump and cherry-pick #27907 by vraspar · Pull Request #28148 · microsoft/onnxruntime

vraspar · 2026-04-21T02:33:30Z

Version bump to 1.25.1 and cherry-pick of #27907 (Add LinearAttention and CausalConvState ops for Qwen3.5) for the 1.25.1 patch release.

Cherry-pick merge commit: 0fedb26

) Adds custom CUDA and CPU kernels for linear attention and causal 1D convolution with state, enabling efficient inference of Qwen3.5 hybrid decoder models in ONNX Runtime. ### New Operators **`LinearAttention`** — Implements the GatedDeltaNet recurrent linear attention mechanism: - Fused kernel computing gated delta-rule update of a recurrent state matrix - Supports both prefill (multi-token) and decode (single-token) paths - Inputs: Q, K, V, decay (alpha), beta gating, optional initial recurrent state - Outputs: attention output, updated recurrent state - CUDA implementation with per-head parallelism; CPU implementation with Eigen **`CausalConvWithState`** — Implements causal 1D convolution with persistent state for autoregressive decoding: - Supports prefill (full convolution) and decode (state-based sliding window) - Inputs: input tensor, conv weights, optional bias, optional initial conv state - Outputs: convolution output, updated conv state ### Op Definitions - Registered in `com.microsoft` domain (opset 1) - Full shape inference and type constraints in `bert_defs.cc` ### Testing - Parity test (`test_parity_linear_attention_causal_conv.py`) validates CUDA and CPU kernels against PyTorch reference implementations from the FLA (Flash Linear Attention) library --------- Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>

vraspar · 2026-04-21T02:37:18Z

Closing - recreating with branch from microsoft/onnxruntime instead of fork

vraspar and others added 2 commits April 21, 2026 02:24

Bump version to 1.25.1

8a4519a

vraspar closed this Apr 21, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ORT 1.25.1 release: version bump and cherry-pick #27907#28148

ORT 1.25.1 release: version bump and cherry-pick #27907#28148
vraspar wants to merge 2 commits intomicrosoft:rel-1.25.1from
vraspar:vraspar/bump-version-1.25.1

vraspar commented Apr 21, 2026

Uh oh!

vraspar commented Apr 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

vraspar commented Apr 21, 2026

Uh oh!

vraspar commented Apr 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants