Add LabelEncoder CUDA execution provider for numeric types by Copilot · Pull Request #28045 · microsoft/onnxruntime

Copilot · 2026-04-13T02:41:56Z

Description

Implements ai.onnx.ml.LabelEncoder on the CUDA execution provider for numeric key/value types using sorted arrays + binary search (O(log n) per element).

New files (onnxruntime/core/providers/cuda/ml/):

label_encoder_impl.cu / .h — CUDA kernel: per-thread binary search on sorted keys, NaN-aware for float/double
label_encoder.cc / .h — Host-side op classes (CudaLabelEncoder for opset 2-3, CudaLabelEncoder_4 for opset 4+). Constructor sorts keys, copies to GPU; ComputeInternal launches kernel.

Modified files:

cuda_execution_provider.cc — Register 11 kernel variants (4 versioned opset 2-3, 7 opset 4+)
provider_api.h — Add missing kMLDomain constant (first ML-domain op on CUDA EP)
docs/OperatorKernels.md — Add ai.onnx.ml section to CUDA provider table

Supported type combinations:

Opset	Types
2-3	`int64↔float`, `int64↔int64`, `float↔float`
4+	Above + `double↔double`, `double↔int64`, `int64↔double`

String types remain CPU-only. NaN keys are placed at end of sorted array and short-circuited before binary search.

Tests: 5 new test cases covering NaN-key-to-numeric-value mappings and double type combinations. Existing numeric tests (FloatToInt64Opset2, Int64ToFloatOpset2, etc.) will automatically run on CUDA via OpTester::Run().

Motivation and Context

Models with large LabelEncoder nodes (>100k entries) force a CPU round-trip when all other nodes run on GPU. This adds the CUDA implementation to eliminate that data transfer bottleneck.

Implements LabelEncoder for the CUDA execution provider supporting numeric types (int64, float, double). Uses sorted arrays and binary search on GPU for efficient O(log n) per-element lookup. Supports: - Opset 2-3: int64↔float, int64↔int64, float↔float - Opset 4+: above plus double↔double, double↔int64, int64↔double String types remain CPU-only as they cannot run on GPU. Agent-Logs-Url: https://github.com/microsoft/onnxruntime/sessions/d17c0a15-3bf2-4ac4-bc57-255876153271 Co-authored-by: tianleiwu <30328909+tianleiwu@users.noreply.github.com>

Add tests for: - Float NaN keys to int64 values (opset 4) - Float NaN keys to float values (opset 4) - Double NaN keys to int64 values (opset 4) - Int64 to double conversion (opset 4) - Double to double conversion (opset 4) These tests exercise the CUDA binary search with NaN handling and double type support. Agent-Logs-Url: https://github.com/microsoft/onnxruntime/sessions/d17c0a15-3bf2-4ac4-bc57-255876153271 Co-authored-by: tianleiwu <30328909+tianleiwu@users.noreply.github.com>

… test intent Agent-Logs-Url: https://github.com/microsoft/onnxruntime/sessions/d17c0a15-3bf2-4ac4-bc57-255876153271 Co-authored-by: tianleiwu <30328909+tianleiwu@users.noreply.github.com>

…der-on-cuda-provider

tianleiwu

Review Summary

This PR adds the first-ever ML domain (ai.onnx.ml) kernel registration for the CUDA EP, implementing LabelEncoder for opset 2-3 and 4+ with numeric types only (int64, float, double). The implementation uses a GPU-appropriate sorted-array + binary search approach instead of hash maps, with proper NaN handling. The code is well-structured, closely mirrors CPU patterns, and all CI checks pass.

Overall: Implementation is correct and well-structured. The concerns below are minor (style/convention adherence and test coverage) and do not affect correctness or safety.

Findings Summary

#	Severity	Component	Issue
1	Nitpick	`label_encoder.cc`	Unused `#include <filesystem>`
2	Suggestion	`label_encoder.cc` `CopyToGpu`	Size arithmetic should use `SafeInt` per project conventions
3	Suggestion	`label_encoder_test.cc`	No empty-input-tensor edge case test

…y-tensor tests - Remove unused #include <filesystem> - Use SafeInt<size_t> for cudaMemcpy size arithmetic in CopyToGpu - Add empty input tensor tests for opset 2 and opset 4 LabelEncoder

Copilot

Pull request overview

Adds a CUDA Execution Provider implementation of ai.onnx.ml.LabelEncoder for numeric key/value types, aiming to avoid CPU round-trips for large label maps.

Changes:

Introduces CUDA kernel + host-side op implementations for LabelEncoder (opset 2–3 and 4+ numeric variants).
Registers the new ML-domain CUDA kernels and adds the ML domain constant for shared-library/provider API usage.
Expands existing LabelEncoder tests (NaN handling, double variants, empty-input) and updates CUDA operator kernel documentation.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
onnxruntime/core/providers/cuda/ml/label_encoder_impl.h	Declares CUDA kernel launcher for sorted-key binary-search LabelEncoder.
onnxruntime/core/providers/cuda/ml/label_encoder_impl.cu	Implements per-element binary search kernel with NaN short-circuit for float/double keys.
onnxruntime/core/providers/cuda/ml/label_encoder.h	Declares CUDA LabelEncoder kernel classes for opset 2–3 and opset 4+.
onnxruntime/core/providers/cuda/ml/label_encoder.cc	Implements attribute loading (list/tensor), key sorting, GPU copy, and kernel registrations for supported type pairs.
onnxruntime/core/providers/cuda/cuda_execution_provider.cc	Adds kernel-class forward decls + registration routine for ML-domain LabelEncoder variants and wires registration into registry init.
onnxruntime/core/providers/shared_library/provider_api.h	Adds missing `kMLDomain` constant for shared-library/provider builds.
onnxruntime/test/providers/cpu/ml/label_encoder_test.cc	Adds numeric NaN-key tests, double type-combination tests, and empty-input tests (intended to exercise CUDA EP when available).
docs/OperatorKernels.md	Documents CUDA provider support for `ai.onnx.ml::LabelEncoder`.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…tics SortKeysValues() now uses std::stable_sort and deduplicates sorted keys, keeping only the first occurrence of each key (including NaN). This matches the CPU LabelEncoder's map_.emplace() first-occurrence-wins behavior. num_keys_ is now set after SortKeysValues since dedup may shrink the arrays.

tianleiwu · 2026-04-21T03:43:43Z

/azp run Win_TRT_Minimal_CUDA_Test_CI, Windows GPU Doc Gen CI Pipeline

azure-pipelines · 2026-04-21T03:43:56Z

Azure Pipelines successfully started running 2 pipeline(s).

tianleiwu · 2026-04-21T04:16:20Z

/azp run Win_TRT_Minimal_CUDA_Test_CI, Windows GPU Doc Gen CI Pipeline

azure-pipelines · 2026-04-21T04:16:33Z

Azure Pipelines successfully started running 2 pipeline(s).

Add the CUDA LabelEncoder sources to the minimal provider build so Windows TRT minimal CI links the new kernel registrations. Also factor the non-plugin TensorProto construction into a helper so the shared-provider path only needs one conditional block.

Move the default_tensor read path in LabelEncoder behind a helper so the plugin and shared-provider TensorProto handling stays in one place. This keeps GetDefaultValue focused on fallback selection instead of attribute transport details.

Initial plan

492f2b4

Copilot AI assigned Copilot and tianleiwu Apr 13, 2026

Copilot started work on behalf of tianleiwu April 13, 2026 02:41 View session

Copilot AI linked an issue Apr 13, 2026 that may be closed by this pull request

[Feature Request] LabelEncoder on Cuda provider #28024

Closed

Copilot AI and others added 3 commits April 13, 2026 02:54

Copilot AI changed the title ~~[WIP] Add LabelEncoder support for CUDA provider~~ Add LabelEncoder CUDA execution provider for numeric types Apr 13, 2026

Copilot AI requested a review from tianleiwu April 13, 2026 03:00

Copilot finished work on behalf of tianleiwu April 13, 2026 03:00

Copilot started work on behalf of tianleiwu April 13, 2026 05:30 View session

Copilot finished work on behalf of tianleiwu April 13, 2026 05:33

tianleiwu added 4 commits April 15, 2026 16:04

Merge remote-tracking branch 'origin/main' into copilot/add-labelenco…

b58706f

…der-on-cuda-provider

lintrunner

5285d96

fix plugin ep build

c956075

fix build

426d878

tianleiwu reviewed Apr 19, 2026

View reviewed changes

Comment thread onnxruntime/core/providers/cuda/ml/label_encoder.cc Outdated

Comment thread onnxruntime/core/providers/cuda/ml/label_encoder.cc

Comment thread onnxruntime/test/providers/cpu/ml/label_encoder_test.cc

Address review feedback: remove unused include, use SafeInt, add empt…

1c33ed5

…y-tensor tests - Remove unused #include <filesystem> - Use SafeInt<size_t> for cudaMemcpy size arithmetic in CopyToGpu - Add empty input tensor tests for opset 2 and opset 4 LabelEncoder

tianleiwu marked this pull request as ready for review April 19, 2026 21:58

tianleiwu requested a review from Copilot April 19, 2026 21:59

Copilot started reviewing on behalf of tianleiwu April 19, 2026 22:00 View session

Copilot AI reviewed Apr 19, 2026

View reviewed changes

Comment thread onnxruntime/core/providers/cuda/ml/label_encoder.cc Outdated

Comment thread docs/OperatorKernels.md

microsoft deleted a comment from azure-pipelines Bot Apr 20, 2026

tianleiwu requested review from kunal-vaishnavi and titaiwangms April 21, 2026 04:14

kunal-vaishnavi reviewed Apr 21, 2026

View reviewed changes

Comment thread onnxruntime/core/providers/cuda/ml/label_encoder.cc Outdated

kunal-vaishnavi previously approved these changes Apr 21, 2026

View reviewed changes

tianleiwu enabled auto-merge (squash) April 21, 2026 19:10

tianleiwu dismissed kunal-vaishnavi’s stale review via 24c59b1 April 22, 2026 05:16

refactor(cuda): extract tensor attribute helper

5d3269b

Move the default_tensor read path in LabelEncoder behind a helper so the plugin and shared-provider TensorProto handling stays in one place. This keeps GetDefaultValue focused on fallback selection instead of attribute transport details.

kunal-vaishnavi approved these changes Apr 22, 2026

View reviewed changes

tianleiwu merged commit 8b6b0b6 into main Apr 25, 2026
94 of 104 checks passed

tianleiwu deleted the copilot/add-labelencoder-on-cuda-provider branch April 25, 2026 02:26

BrewTestBot mentioned this pull request May 8, 2026

onnxruntime 1.26.0 Homebrew/homebrew-core#281672

Merged

This was referenced May 8, 2026

chore(deps): Bump Microsoft.ML.OnnxRuntime from 1.22.0 to 1.26.0 verbara/Verbara.Sdk#3

Open

Bump Microsoft.ML.OnnxRuntime from 1.25.1 to 1.26.0 IoTSharp/SonnetDB#62

Open

Bump Microsoft.ML.OnnxRuntime from 1.25.1 to 1.26.0 lopatnov/translate#23

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add LabelEncoder CUDA execution provider for numeric types#28045

Add LabelEncoder CUDA execution provider for numeric types#28045
tianleiwu merged 12 commits intomainfrom
copilot/add-labelencoder-on-cuda-provider

Copilot AI commented Apr 13, 2026 •

edited

Loading

Uh oh!

tianleiwu left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

tianleiwu commented Apr 21, 2026

Uh oh!

azure-pipelines Bot commented Apr 21, 2026

Uh oh!

tianleiwu commented Apr 21, 2026

Uh oh!

azure-pipelines Bot commented Apr 21, 2026

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

Copilot AI commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Motivation and Context

Uh oh!

tianleiwu left a comment

Choose a reason for hiding this comment

Review Summary

Findings Summary

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

tianleiwu commented Apr 21, 2026

Uh oh!

azure-pipelines Bot commented Apr 21, 2026

Uh oh!

tianleiwu commented Apr 21, 2026

Uh oh!

azure-pipelines Bot commented Apr 21, 2026

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Copilot AI commented Apr 13, 2026 •

edited

Loading