Skip to content

Add LabelEncoder CUDA execution provider for numeric types#28045

Merged
tianleiwu merged 12 commits intomainfrom
copilot/add-labelencoder-on-cuda-provider
Apr 25, 2026
Merged

Add LabelEncoder CUDA execution provider for numeric types#28045
tianleiwu merged 12 commits intomainfrom
copilot/add-labelencoder-on-cuda-provider

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Apr 13, 2026

Description

Implements ai.onnx.ml.LabelEncoder on the CUDA execution provider for numeric key/value types using sorted arrays + binary search (O(log n) per element).

New files (onnxruntime/core/providers/cuda/ml/):

  • label_encoder_impl.cu / .h — CUDA kernel: per-thread binary search on sorted keys, NaN-aware for float/double
  • label_encoder.cc / .h — Host-side op classes (CudaLabelEncoder for opset 2-3, CudaLabelEncoder_4 for opset 4+). Constructor sorts keys, copies to GPU; ComputeInternal launches kernel.

Modified files:

  • cuda_execution_provider.cc — Register 11 kernel variants (4 versioned opset 2-3, 7 opset 4+)
  • provider_api.h — Add missing kMLDomain constant (first ML-domain op on CUDA EP)
  • docs/OperatorKernels.md — Add ai.onnx.ml section to CUDA provider table

Supported type combinations:

Opset Types
2-3 int64↔float, int64↔int64, float↔float
4+ Above + double↔double, double↔int64, int64↔double

String types remain CPU-only. NaN keys are placed at end of sorted array and short-circuited before binary search.

Tests: 5 new test cases covering NaN-key-to-numeric-value mappings and double type combinations. Existing numeric tests (FloatToInt64Opset2, Int64ToFloatOpset2, etc.) will automatically run on CUDA via OpTester::Run().

Motivation and Context

Models with large LabelEncoder nodes (>100k entries) force a CPU round-trip when all other nodes run on GPU. This adds the CUDA implementation to eliminate that data transfer bottleneck.

Copilot AI and others added 3 commits April 13, 2026 02:54
Implements LabelEncoder for the CUDA execution provider supporting
numeric types (int64, float, double). Uses sorted arrays and binary
search on GPU for efficient O(log n) per-element lookup.

Supports:
- Opset 2-3: int64↔float, int64↔int64, float↔float
- Opset 4+: above plus double↔double, double↔int64, int64↔double

String types remain CPU-only as they cannot run on GPU.

Agent-Logs-Url: https://github.com/microsoft/onnxruntime/sessions/d17c0a15-3bf2-4ac4-bc57-255876153271

Co-authored-by: tianleiwu <30328909+tianleiwu@users.noreply.github.com>
Add tests for:
- Float NaN keys to int64 values (opset 4)
- Float NaN keys to float values (opset 4)
- Double NaN keys to int64 values (opset 4)
- Int64 to double conversion (opset 4)
- Double to double conversion (opset 4)

These tests exercise the CUDA binary search with NaN handling
and double type support.

Agent-Logs-Url: https://github.com/microsoft/onnxruntime/sessions/d17c0a15-3bf2-4ac4-bc57-255876153271

Co-authored-by: tianleiwu <30328909+tianleiwu@users.noreply.github.com>
… test intent

Agent-Logs-Url: https://github.com/microsoft/onnxruntime/sessions/d17c0a15-3bf2-4ac4-bc57-255876153271

Co-authored-by: tianleiwu <30328909+tianleiwu@users.noreply.github.com>
Copilot AI changed the title [WIP] Add LabelEncoder support for CUDA provider Add LabelEncoder CUDA execution provider for numeric types Apr 13, 2026
Copilot AI requested a review from tianleiwu April 13, 2026 03:00
Copy link
Copy Markdown
Contributor

@tianleiwu tianleiwu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Summary

This PR adds the first-ever ML domain (ai.onnx.ml) kernel registration for the CUDA EP, implementing LabelEncoder for opset 2-3 and 4+ with numeric types only (int64, float, double). The implementation uses a GPU-appropriate sorted-array + binary search approach instead of hash maps, with proper NaN handling. The code is well-structured, closely mirrors CPU patterns, and all CI checks pass.

Overall: Implementation is correct and well-structured. The concerns below are minor (style/convention adherence and test coverage) and do not affect correctness or safety.

Findings Summary

# Severity Component Issue
1 Nitpick label_encoder.cc Unused #include <filesystem>
2 Suggestion label_encoder.cc CopyToGpu Size arithmetic should use SafeInt per project conventions
3 Suggestion label_encoder_test.cc No empty-input-tensor edge case test

Comment thread onnxruntime/core/providers/cuda/ml/label_encoder.cc Outdated
Comment thread onnxruntime/core/providers/cuda/ml/label_encoder.cc
Comment thread onnxruntime/test/providers/cpu/ml/label_encoder_test.cc
…y-tensor tests

- Remove unused #include <filesystem>
- Use SafeInt<size_t> for cudaMemcpy size arithmetic in CopyToGpu
- Add empty input tensor tests for opset 2 and opset 4 LabelEncoder
@tianleiwu tianleiwu marked this pull request as ready for review April 19, 2026 21:58
@tianleiwu tianleiwu requested a review from Copilot April 19, 2026 21:59
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a CUDA Execution Provider implementation of ai.onnx.ml.LabelEncoder for numeric key/value types, aiming to avoid CPU round-trips for large label maps.

Changes:

  • Introduces CUDA kernel + host-side op implementations for LabelEncoder (opset 2–3 and 4+ numeric variants).
  • Registers the new ML-domain CUDA kernels and adds the ML domain constant for shared-library/provider API usage.
  • Expands existing LabelEncoder tests (NaN handling, double variants, empty-input) and updates CUDA operator kernel documentation.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
onnxruntime/core/providers/cuda/ml/label_encoder_impl.h Declares CUDA kernel launcher for sorted-key binary-search LabelEncoder.
onnxruntime/core/providers/cuda/ml/label_encoder_impl.cu Implements per-element binary search kernel with NaN short-circuit for float/double keys.
onnxruntime/core/providers/cuda/ml/label_encoder.h Declares CUDA LabelEncoder kernel classes for opset 2–3 and opset 4+.
onnxruntime/core/providers/cuda/ml/label_encoder.cc Implements attribute loading (list/tensor), key sorting, GPU copy, and kernel registrations for supported type pairs.
onnxruntime/core/providers/cuda/cuda_execution_provider.cc Adds kernel-class forward decls + registration routine for ML-domain LabelEncoder variants and wires registration into registry init.
onnxruntime/core/providers/shared_library/provider_api.h Adds missing kMLDomain constant for shared-library/provider builds.
onnxruntime/test/providers/cpu/ml/label_encoder_test.cc Adds numeric NaN-key tests, double type-combination tests, and empty-input tests (intended to exercise CUDA EP when available).
docs/OperatorKernels.md Documents CUDA provider support for ai.onnx.ml::LabelEncoder.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread onnxruntime/core/providers/cuda/ml/label_encoder.cc Outdated
Comment thread docs/OperatorKernels.md
…tics

SortKeysValues() now uses std::stable_sort and deduplicates sorted keys,
keeping only the first occurrence of each key (including NaN). This matches
the CPU LabelEncoder's map_.emplace() first-occurrence-wins behavior.

num_keys_ is now set after SortKeysValues since dedup may shrink the arrays.
@microsoft microsoft deleted a comment from azure-pipelines Bot Apr 20, 2026
@tianleiwu
Copy link
Copy Markdown
Contributor

/azp run Win_TRT_Minimal_CUDA_Test_CI, Windows GPU Doc Gen CI Pipeline

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 2 pipeline(s).

@tianleiwu
Copy link
Copy Markdown
Contributor

/azp run Win_TRT_Minimal_CUDA_Test_CI, Windows GPU Doc Gen CI Pipeline

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 2 pipeline(s).

Comment thread onnxruntime/core/providers/cuda/ml/label_encoder.cc Outdated
@tianleiwu tianleiwu enabled auto-merge (squash) April 21, 2026 19:10
Add the CUDA LabelEncoder sources to the minimal provider build so
Windows TRT minimal CI links the new kernel registrations.

Also factor the non-plugin TensorProto construction into a helper so the
shared-provider path only needs one conditional block.
Move the default_tensor read path in LabelEncoder behind a helper so the
plugin and shared-provider TensorProto handling stays in one place.

This keeps GetDefaultValue focused on fallback selection instead of
attribute transport details.
@tianleiwu tianleiwu merged commit 8b6b0b6 into main Apr 25, 2026
94 of 104 checks passed
@tianleiwu tianleiwu deleted the copilot/add-labelencoder-on-cuda-provider branch April 25, 2026 02:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature Request] LabelEncoder on Cuda provider

4 participants