Skip to content

add support for DFT with onesided=True and inverse=True (irfft)#27028

Merged
justinchuby merged 12 commits intomicrosoft:mainfrom
simonbyrne:sb/irfft
May 2, 2026
Merged

add support for DFT with onesided=True and inverse=True (irfft)#27028
justinchuby merged 12 commits intomicrosoft:mainfrom
simonbyrne:sb/irfft

Conversation

@simonbyrne
Copy link
Copy Markdown
Contributor

Description

Adds support for the DFT operator when onesided=True and inverse=True (corresponding to the irfft operation in numpy and pytorch).

Motivation and Context

This addresses issue onnx/onnx#5920, and adds support to the changes in onnx/onnx#7574.

Signed-off-by: Simon Byrne <sbyrne@nvidia.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds CPU EP support for DFT with onesided=True and inverse=True (IRFFT semantics) and expands test coverage for IRFFT and rank-2 inputs.

Changes:

  • Implement IRFFT handling in radix-2 FFT path (conjugate symmetry reconstruction + real-valued output).
  • Implement IRFFT handling in Bluestein path (conjugate symmetry reconstruction + real-valued output).
  • Add new unit tests for IRFFT (radix-2, Bluestein, round-trip) and rank-2 real/complex DFT inputs.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File Description
onnxruntime/core/providers/cpu/signal/dft.cc Adds IRFFT support and adjusts shape/output handling for inverse && onesided.
onnxruntime/test/providers/cpu/signal/signal_ops_test.cc Adds IRFFT test cases and rank-2 DFT/RFFT coverage.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread onnxruntime/core/providers/cpu/signal/dft.cc Outdated
Comment thread onnxruntime/core/providers/cpu/signal/dft.cc Outdated
Comment thread onnxruntime/core/providers/cpu/signal/dft.cc Outdated
Comment thread onnxruntime/test/providers/cpu/signal/signal_ops_test.cc
Signed-off-by: Simon Byrne <sbyrne@nvidia.com>
justinchuby pushed a commit to microsoft/onnxscript that referenced this pull request Jan 23, 2026
This fixes the onnxscript export for the irfft function. 

Fixes onnx/onnx#5920, and adds support to the
changes in onnx/onnx#7574 and
microsoft/onnxruntime#27028.

Most of the diff is due to the onnx_opset generated code changes from
onnx/onnx#5920. That can be removed if you
would prefer.

---------

Signed-off-by: Simon Byrne <sbyrne@nvidia.com>
Signed-off-by: Simon Byrne <sbyrne@nvidia.com>
Signed-off-by: Simon Byrne <sbyrne@nvidia.com>
Signed-off-by: Simon Byrne <sbyrne@nvidia.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread onnxruntime/test/providers/cpu/signal/signal_ops_test.cc
Comment thread onnxruntime/core/providers/cpu/signal/dft.cc
Comment thread onnxruntime/core/providers/cpu/signal/dft.cc
Comment thread onnxruntime/core/providers/cpu/signal/dft.cc
Comment thread onnxruntime/test/providers/cpu/signal/signal_ops_test.cc
simonbyrne and others added 2 commits January 23, 2026 16:07
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Signed-off-by: Simon Byrne <sbyrne@nvidia.com>
@istupakov
Copy link
Copy Markdown

Hi @simonbyrne!
It's great that irfft is now supported by ONNX. Is there any hope for improving DFT/STFT performance? Currently, they run about 20 times slower than numpy on CPU and are not supported via CUDAExecutionProvider. When applied to ASR, this means that spectrogram calculations can take longer than the speech recognition itself...

@simonbyrne
Copy link
Copy Markdown
Contributor Author

It's great that irfft is now supported by ONNX. Is there any hope for improving DFT/STFT performance? Currently, they run about 20 times slower than numpy on CPU and are not supported via CUDAExecutionProvider. When applied to ASR, this means that spectrogram calculations can take longer than the speech recognition itself...

Yes, you should use an existing optimized FFT library (e.g. FFTW or cuFFT) 😄

If you want a simpler option, you may just be able to call into PocketFFT (which is used by NumPy, and should already included as its a dependency)

@simonbyrne
Copy link
Copy Markdown
Contributor Author

PocketFFT is actually header-only (https://github.com/mreineck/pocketfft), that would be the easiest option by far (FFTW is fast, but it is GPL licensed which might be a sticking point).

@simonbyrne
Copy link
Copy Markdown
Contributor Author

Bumping this. Does this require a new release of ONNX to support?

@justinchuby
Copy link
Copy Markdown
Contributor

Bumping this. Does this require a new release of ONNX to support?

Sorry I lost track of this. I think it is good to merge, thanks

@justinchuby
Copy link
Copy Markdown
Contributor

It's great that irfft is now supported by ONNX. Is there any hope for improving DFT/STFT performance? Currently, they run about 20 times slower than numpy on CPU and are not supported via CUDAExecutionProvider. When applied to ASR, this means that spectrogram calculations can take longer than the speech recognition itself...

Yes, you should use an existing optimized FFT library (e.g. FFTW or cuFFT) 😄

If you want a simpler option, you may just be able to call into PocketFFT (which is used by NumPy, and should already included as its a dependency)

Contributions welcome! The current implementation is very unoptimized.

@justinchuby
Copy link
Copy Markdown
Contributor

I think since onnxruntime uses onnx shape inferencer and it currently raises an error, we still do need a new onnx version (end of Feb) for the models to run.

@justinchuby justinchuby self-assigned this Feb 10, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

Comments suppressed due to low confidence (1)

onnxruntime/core/providers/cpu/signal/dft.cc:447

  • number_of_samples is an int64_t, but when dft_length is provided it is assigned via static_cast<int>(...). This can truncate values on platforms where int is 32-bit and lead to incorrect output shapes/loop bounds for large dft_length. Assign as int64_t (no narrowing cast) and keep the existing > 0 validation on the full 64-bit value.
    const auto& dft_length_shape = dft_length->Shape();
    ORT_RETURN_IF(!dft_length_shape.IsScalar(), "dft_length must be a scalar value.");
    number_of_samples = static_cast<int>(signal::get_scalar_value_from_tensor<int64_t>(dft_length));
    ORT_RETURN_IF(number_of_samples <= 0, "dft_length must be greater than zero.");
  }

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread onnxruntime/test/providers/cpu/signal/signal_ops_test.cc
@yan12125
Copy link
Copy Markdown
Contributor

I think since onnxruntime uses onnx shape inferencer and it currently raises an error, we still do need a new onnx version (end of Feb) for the models to run.

onnx 1.21.0, which contains onnx/onnx#7574, is released two weeks ago. Can this be pushed forward?

Missing support for changed DFT nodes in onnxruntime forces us to use older onnxscript and torch, and thus fixes from newer libraries cannot be used.

@justinchuby
Copy link
Copy Markdown
Contributor

Thanks for the reminder. @titaiwangms do you know if we have the new onnx dependency in?

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread onnxruntime/core/providers/cpu/signal/dft.cc
Comment thread onnxruntime/core/providers/cpu/signal/dft.cc Outdated
Comment thread onnxruntime/core/providers/cpu/signal/dft.cc Outdated
Comment thread onnxruntime/core/providers/cpu/signal/dft.cc
@justinchuby
Copy link
Copy Markdown
Contributor

@simonbyrne could you check review comments?

auto-merge was automatically disabled April 14, 2026 21:53

Head branch was pushed to by a user without write access

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
justinchuby
justinchuby previously approved these changes Apr 15, 2026
@justinchuby justinchuby enabled auto-merge (squash) April 15, 2026 01:12
@justinchuby
Copy link
Copy Markdown
Contributor

@simonbyrne
Copy link
Copy Markdown
Contributor Author

I'm not sure why they're failing on that platform.

@simonbyrne
Copy link
Copy Markdown
Contributor Author

Is there a way to debug? or skip?

@yan12125
Copy link
Copy Markdown
Contributor

I'm not sure why they're failing on that platform.

Looks like DirectML provider has its own shape inferrer and does not allow onesided=True && inverse=True yet:

throw new std::exception("onesided and inverse attributes cannot be enabled at the same time");

I assume the shape inferrer is related, as the file in the error message (onnxruntime\core\providers\dml\DmlExecutionProvider\src\MLOperatorAuthorImpl.cpp(2844)) points to usage of shape inferrer:

ORT_THROW_IF_FAILED(shapeInferrer->InferOutputShapes(inferenceContext.Get()));

Here's an excerpt of CI logs:

2026-04-15T03:02:44.0495511Z [ RUN      ] SignalOpsTest.DFT17_IRFFT_radix2
2026-04-15T03:02:44.0497330Z �[1;31m2026-04-15 02:46:24.0831294 [E:onnxruntime:DFT, sequential_executor.cc:614 onnxruntime::ExecuteKernel] Non-zero status code returned while running DFT node. Name:'node1' Status Message: D:\a\_work\onnxruntime\onnxruntime\onnxruntime\core\providers\dml\DmlExecutionProvider\src\MLOperatorAuthorImpl.cpp(2844)\onnxruntime_provider_test.exe!00007FF6B7129120: (caller: 00007FF6B7128E45) Exception(174) tid(16bc) 80004005 Unspecified error
2026-04-15T03:02:44.0497400Z �[m
2026-04-15T03:02:44.0497840Z D:\a\_work\onnxruntime\onnxruntime\onnxruntime\test\unittest_util\base_tester.cc(353): error: Expected equality of these values:
2026-04-15T03:02:44.0497896Z   expect_result
2026-04-15T03:02:44.0497989Z     Which is: 4-byte object <00-00 00-00>
2026-04-15T03:02:44.0498074Z   ExpectResult::kExpectFailure
2026-04-15T03:02:44.0498156Z     Which is: 4-byte object <01-00 00-00>
2026-04-15T03:02:44.0499487Z Run failed but expected success: Non-zero status code returned while running DFT node. Name:'node1' Status Message: D:\a\_work\onnxruntime\onnxruntime\onnxruntime\core\providers\dml\DmlExecutionProvider\src\MLOperatorAuthorImpl.cpp(2844)\onnxruntime_provider_test.exe!00007FF6B7129120: (caller: 00007FF6B7128E45) Exception(174) tid(16bc) 80004005 Unspecified error
2026-04-15T03:02:44.0499496Z 
2026-04-15T03:02:44.0499574Z Google Test trace:
2026-04-15T03:02:44.0500055Z D:\a\_work\onnxruntime\onnxruntime\onnxruntime\test\unittest_util\base_tester.cc(912): registered execution providers: DmlExecutionProvider
2026-04-15T03:02:44.0500063Z 
2026-04-15T03:02:44.0500221Z [  FAILED  ] SignalOpsTest.DFT17_IRFFT_radix2 (275 ms)

@justinchuby
Copy link
Copy Markdown
Contributor

justinchuby commented Apr 22, 2026

CI Failure Diagnosis (Copilot)

1. Windows GPU DML CI Pipeline — 8 IRFFT test failures

Root cause: The DirectML (DML) execution provider has its own shape inferrer for DFT that explicitly rejects onesided=true && inverse=true:

if (isInverse && isOnesided)
{
throw new std::exception("onesided and inverse attributes cannot be enabled at the same time");
}

if (isInverse && isOnesided)
{
    throw new std::exception("onesided and inverse attributes cannot be enabled at the same time");
}

When the DML provider is registered in the test runner, it intercepts the DFT op and its shape inferrer throws before the CPU kernel can run, causing all 8 IRFFT tests to fail.

2. wasm_Release / build-wasm — same 8 IRFFT test failures

The WASM build has the same 8 test failures. There is no JSEP/WebNN DFT kernel, so the tests fall through to the CPU provider compiled to WASM via Emscripten. Since all native CPU builds (Linux x64, Windows x64, ARM64, etc.) pass, the issue is WASM-specific. Without the detailed error messages (shape mismatch vs value mismatch vs kernel error), the exact root cause is uncertain, but likely candidates are:

  • Emscripten-specific C++ behavior: Subtle differences in how std::complex operations, reinterpret_cast, or integer-to-complex conversions work under Emscripten optimization. For instance, the ternary expressions like auto x = (bit_reversed_index < number_of_samples) ? *(X_data + bit_reversed_index * X_stride) : 0; mix std::complex<float> with int literals — while valid C++, different compilers may handle the implicit conversion differently under optimization.
  • WASM exception handling: WASM Release builds disable exceptions (--enable_wasm_api_exception_catching provides only limited exception support). If the IRFFT kernel encounters a validation error and throws, the WASM runtime may not catch it properly, leading to test failure.

Suggested Fixes

For DML (definitive fix): Exclude the DML execution provider from IRFFT tests since DML does not yet support onesided=true && inverse=true. Add test.SetExcludeExecutionProviders({kDmlExecutionProvider}); before test.Run() in each IRFFT test function (TestIRFFTRadix2Float, TestIRFFTNaiveFloat, TestRFFTIRFFTRoundTrip, TestDFT2DComplexOnesidedInverse). The DML provider can add IRFFT support separately.

For WASM: The detailed WASM error messages would help narrow down the exact cause. If the failures are similar to DML (an EP-level rejection), the same SetExcludeExecutionProviders pattern applies. If it is a numerical issue under Emscripten, the mixed-type ternary expressions in fft_radix2 (lines 138-140 in the PR) could be made explicit — e.g., replace ? *(X_data + ...) : 0 with ? *(X_data + ...) : U(0) to avoid any implicit conversion ambiguity.

(Credit to @yan12125 who also identified the DML shape inferrer issue in an earlier comment.)

@yan12125
Copy link
Copy Markdown
Contributor

For wasm_Release build, somehow I can download logs on my phone but not on laptop. Anyway, here are excerpted logs:

2026-04-15T03:59:44.5256669Z FAILED TESTS:
2026-04-15T03:59:44.5264740Z   #OpTest# - bias-split-gelu.jsonc
2026-04-15T03:59:44.5265129Z     [webgpu]BiasSplitGelu - BiasSplitGelu
2026-04-15T03:59:44.5269742Z       ✖ bias split gelu [1,1,2560]x[2560]
2026-04-15T03:59:44.5270157Z         Chrome Headless 147.0.0.0 (Windows 10)
2026-04-15T03:59:44.5272466Z       Error: failed to call OrtRun(). ERROR_CODE: 1, ERROR_MESSAGE: Non-zero status code returned while running BiasSplitGelu node. Name:'BiasSplitGelu' Status Message: shader_helper.cc:250 ValidateVariableDependency Input dependency is not set for "Type", but type alias for element type or value type is used.
2026-04-15T03:59:44.5274223Z           at checkLastError (dist/ort.all.js:24875:17)
2026-04-15T03:59:44.5274658Z           at run (dist/ort.all.js:26827:13)
2026-04-15T03:59:44.5275446Z           at async OnnxruntimeWebAssemblySessionHandler.run (dist/ort.all.js:27371:27)
2026-04-15T03:59:44.5276467Z           at async _InferenceSession.run (dist/ort.all.js:1134:27)
2026-04-15T03:59:44.5277267Z           at async sessionRun (test/ort.test.js:42508:23)
2026-04-15T03:59:44.5277815Z           at async runProtoOpTestcase (test/ort.test.js:42612:27)
2026-04-15T03:59:44.5278346Z           at async runOpTest (test/ort.test.js:42650:7)
2026-04-15T03:59:44.5278844Z           at async Context.<anonymous> (test/ort.test.js:45731:15)
2026-04-15T03:59:44.5279164Z 
2026-04-15T03:59:44.5279329Z   #OpTest# - multihead-attention.jsonc
2026-04-15T03:59:44.5280067Z     [webgpu]MultiHeadAttention - MultiHeadAttention Basic, one head and head-size=4 with pastKey and pastValue
2026-04-15T03:59:44.5280886Z       ✖ T[0] (slow: 0.12 secs)
2026-04-15T03:59:44.5281230Z         Chrome Headless 147.0.0.0 (Windows 10)
2026-04-15T03:59:44.5281603Z       Error: tensor data should match
2026-04-15T03:59:44.5282283Z           at _TensorResultValidator.checkApiTensorResult (test/ort.test.js:42846:21)
2026-04-15T03:59:44.5282948Z           at runProtoOpTestcase (test/ort.test.js:42617:16)
2026-04-15T03:59:44.5283397Z           at async runOpTest (test/ort.test.js:42650:7)
2026-04-15T03:59:44.5283890Z           at async Context.<anonymous> (test/ort.test.js:45731:15)

Neither looks related to DFT. I guess merging with main again will fix it.

@simonbyrne
Copy link
Copy Markdown
Contributor Author

i've merged in main, it needs an approval to run CI again

auto-merge was automatically disabled April 22, 2026 22:35

Head branch was pushed to by a user without write access

@yan12125
Copy link
Copy Markdown
Contributor

Regarding "Linux TensorRT CI", this pipeline has been failing for a while until a few hours ago:
https://github.com/microsoft/onnxruntime/actions?query=workflow%3A%22Linux+TensorRT+CI%22+event%3Apush+branch%3Amain. If passing all CI is needed for merging this, maybe merging again with main is needed.

@simonbyrne
Copy link
Copy Markdown
Contributor Author

bump?

@yan12125
Copy link
Copy Markdown
Contributor

For "ONNX Runtime WebGPU Builds", apparent all failed tests are caused by DXGI_ERROR_DEVICE_REMOVED. For example,

2026-04-23T23:48:34.8335120Z 2: [ RUN      ] SignalOpsTest.DFT20_Float_naive
2026-04-23T23:48:34.8341021Z 2: D:\a\_work\onnxruntime\onnxruntime\onnxruntime\core\providers\webgpu\webgpu_context.cc:117 onnxruntime::webgpu::WebGpuContext::Initialize::<lambda_1>::()::<lambda_4>::<lambda_invoker_cdecl> status == wgpu::RequestDeviceStatus::Success was false. Failed to get a WebGPU device: D3D12 create command queue failed with DXGI_ERROR_DEVICE_REMOVED (0x887A0005)
2026-04-23T23:48:34.8341492Z 2:     at CheckHRESULTImpl (D:\a\_work\onnxruntime\onnxruntime\RelWithDebInfo\_deps\dawn-src\src\dawn\native\d3d\D3DError.cpp:119)
2026-04-23T23:48:34.8341544Z 2: 
2026-04-23T23:48:34.8341596Z 2: 
2026-04-23T23:48:34.8341716Z 2: Provider:WebGpuExecutionProvider
2026-04-23T23:48:34.8343689Z 2: unknown file: error: C++ exception with description "D:\a\_work\onnxruntime\onnxruntime\onnxruntime\core\providers\webgpu\webgpu_context.cc:117 onnxruntime::webgpu::WebGpuContext::Initialize::<lambda_1>::()::<lambda_4>::<lambda_invoker_cdecl> status == wgpu::RequestDeviceStatus::Success was false. Failed to get a WebGPU device: D3D12 create command queue failed with DXGI_ERROR_DEVICE_REMOVED (0x887A0005)
2026-04-23T23:48:34.8344252Z 2:     at CheckHRESULTImpl (D:\a\_work\onnxruntime\onnxruntime\RelWithDebInfo\_deps\dawn-src\src\dawn\native\d3d\D3DError.cpp:119)
2026-04-23T23:48:34.8344313Z 2: 
2026-04-23T23:48:34.8344405Z 2: " thrown in the test body.
2026-04-23T23:48:34.8344452Z 2: 
2026-04-23T23:48:34.8344623Z 2: [  FAILED  ] SignalOpsTest.DFT20_Float_naive (0 ms)

This looks more like an issue in CI builders, not this pull request. I'm not sure if pull requests are allowed to be merged without getting green in all CI builders or not. If not, unstable CI can cause issues for contributors.

Maybe @justinchuby can comment on PR merging policy.

@justinchuby
Copy link
Copy Markdown
Contributor

/azp run Linux QNN CI Pipeline,Win_TRT_Minimal_CUDA_Test_CI,Windows ARM64 QNN CI Pipeline,Windows GPU Doc Gen CI Pipeline,Windows x64 QNN CI Pipeline

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 2 pipeline(s).

@simonbyrne simonbyrne requested a review from justinchuby April 27, 2026 17:57
@justinchuby justinchuby merged commit 1e4ee66 into microsoft:main May 2, 2026
90 of 94 checks passed
@justinchuby
Copy link
Copy Markdown
Contributor

Thanks a lot!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants