Skip to content

Add CUDA Plugin EP CI and fix Windows plugin build support#27959

Merged
tianleiwu merged 12 commits into
mainfrom
tlwu/20260402/cuda_plugin_ci
Apr 7, 2026
Merged

Add CUDA Plugin EP CI and fix Windows plugin build support#27959
tianleiwu merged 12 commits into
mainfrom
tlwu/20260402/cuda_plugin_ci

Conversation

@tianleiwu
Copy link
Copy Markdown
Contributor

Summary

This PR improves CUDA Plugin EP development and validation in three areas:

Fixes the Windows CUDA Plugin EP build so the plugin can be compiled successfully with MSVC.
Adds dedicated Windows and Linux GitHub Actions workflows for building and testing the CUDA Plugin EP.
Expands the quick start documentation with instructions for running the CUDA Plugin EP Python tests locally.

Changes

Windows build fixes

  • Update the CUDA plugin CMake configuration to use the correct forced-include flags on Windows/MSVC.
  • Keep the existing forced-include behavior for non-MSVC toolchains.
  • Add the missing GetEnvironmentVar(const std::string&) forward declaration needed by plugin builds on Windows.

CI coverage for CUDA Plugin EP

Add a Windows CUDA Plugin EP workflow that:

  • builds ONNX Runtime with onnxruntime_BUILD_CUDA_EP_AS_PLUGIN=ON
  • uploads build artifacts
  • installs the built wheel
  • sets ORT_CUDA_PLUGIN_PATH
  • runs test_cuda_plugin_ep.py

Add a similar Linux CUDA Plugin EP workflow.

Documentation updates

  • Add a Running Tests section to the CUDA Plugin EP quick start.
  • Document test prerequisites, dependency installation, and ORT_CUDA_PLUGIN_PATH.
  • Clarify that CPU-only PyTorch is sufficient for test_cuda_plugin_ep.py because it is used for CPU-side reference computations.

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can commit the suggested changes from lintrunner.

Comment thread onnxruntime/core/providers/cuda/plugin/cuda_kernel_adapter.h Outdated
Copy link
Copy Markdown
Member

@yuslepukhin yuslepukhin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Factory fallback CreateSyncStreamForDeviceImpl (cuda_ep_factory.cc:531-549) does not validate mem_type == OrtDeviceMemoryType_DEFAULT, unlike the CudaEp-level version. This inconsistency is pre-existing but could be confusing.

@yuslepukhin
Copy link
Copy Markdown
Member

SyncImpl restores device even when cudaSetDevice(ep->config_.device_id) fails. If cudaSetDevice fails, cudaDeviceSynchronize is skipped (good), but the restore branch still runs cudaSetDevice(prev_device). This is harmless since the device may not have changed, but the restore status would overwrite the original error if it somehow fails too. The current priority logic (if (status.IsOK()) status = restore_status) handles this correctly — only propagates restore error if the primary op succeeded.

Comment thread .github/workflows/linux_cuda_plugin_ci.yml Outdated
Comment thread onnxruntime/core/providers/cuda/plugin/provider_api_shims.cc Outdated
Comment thread onnxruntime/contrib_ops/cuda/llm/cutlass_extensions/gemm_configs.h
@tianleiwu
Copy link
Copy Markdown
Contributor Author

@yuslepukhin, for sync stream related issues, I could address them in another PR. This PR is mainly for CI setup.

Comment thread onnxruntime/core/platform/env_var.h
@tianleiwu tianleiwu enabled auto-merge (squash) April 6, 2026 23:51
@tianleiwu tianleiwu merged commit 6830ff7 into main Apr 7, 2026
103 of 106 checks passed
@tianleiwu tianleiwu deleted the tlwu/20260402/cuda_plugin_ci branch April 7, 2026 00:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants