CUDA Resize: add optimized 3D nearest resize kernel for 5D up/down sa…#27578
Conversation
|
@microsoft-github-policy-service agree |
|
@johannes-rehm-snkeos, could you share benchmark/profiling results that show that the new kernel is better? |
|
@tianleiwu I used your script from here: #14596 (comment) and got the following results: Profiling of Torch: Profiling of onnxruntime-gpu==1.24.3: Profiling of johannes-rehm-snkeos:cuda-resize-nearest-3d-kernel: |
There was a problem hiding this comment.
Pull request overview
This PR adds a CUDA optimized fast-path for nearest-neighbor 3D resize (mapping + execution) to improve performance on rank≥3 tensors where only the last three dimensions are resized and all outer-dimension scales are 1.0, and introduces CUDA-targeted regression tests to validate the new path.
Changes:
- Added CUDA nearest-neighbor 3D mapping and compute kernels and a dispatch fast-path in
ResizeNearestImpl. - Enabled the new 3D optimized path when
coordinate_transformation_mode != tf_crop_and_resizeand all outer scales (except last 3 dims) are exactly1.0. - Added CUDA regression tests covering 5D nearest upsample and downsample scenarios intended to hit the optimized 3D mapping path.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
onnxruntime/core/providers/cuda/tensor/resize_impl.cu |
Introduces optimized nearest-neighbor 3D mapping/compute CUDA kernels and a conditional fast-path in ResizeNearestImpl. |
onnxruntime/test/providers/cpu/tensor/resize_op_test.cc |
Adds CUDA-targeted regression tests for 5D nearest resize upsample/downsample intended to exercise the optimized 3D path. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
|
/azp run Linux QNN CI Pipeline, Win_TRT_Minimal_CUDA_Test_CI, Windows ARM64 QNN CI Pipeline, Windows GPU Doc Gen CI Pipeline |
|
Azure Pipelines successfully started running 4 pipeline(s). |
Summary
This PR adds CUDA support for optimized nearest-neighbor 3D resize mapping/execution in the Resize operator path, and adds targeted regression coverage.
The implementation introduces a dedicated 3D fast path for nearest resize to handle the last three spatial dimensions (
D/H/W) efficiently when outer dimensions are unchanged.What Changed
CUDA Resize implementation
File:
onnxruntime/core/providers/cuda/tensor/resize_impl.cu_ResizeNearestMappingKernel3D_ResizeNearestKernel3DResizeNearestImpl:rank >= 3coordinate_transformation_mode != tf_crop_and_resize1.0This keeps existing behavior unchanged for other cases while using the optimized path for true 3D nearest resize workloads.
Regression tests
File:
onnxruntime/test/providers/cpu/tensor/resize_op_test.ccAdded CUDA-targeted regression tests:
ResizeOpNearestUpSampleTest_5D_CudaRegression_Optimized3DMappingResizeOpNearestDownSampleTest_5D_CudaRegression_Optimized3DMappingWhy
The previous nearest implementation relied on the generic path for these 3D scenarios. This change introduces a dedicated CUDA 3D path to improve performance for 5D nearest resize workloads.
Fixes #14596