Skip to content

Update CUDA TopK kernel registration to opset 24 with BFloat16 support#27735

Merged
tianleiwu merged 6 commits intomainfrom
copilot/update-topk-fill-opset-gap
Mar 24, 2026
Merged

Update CUDA TopK kernel registration to opset 24 with BFloat16 support#27735
tianleiwu merged 6 commits intomainfrom
copilot/update-topk-fill-opset-gap

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Mar 18, 2026

  • Cap existing CUDA TopK kernel to versioned [11, 23] and add opset 24
  • Add BFloat16 support for CUDA TopK opset 24 (topk_impl_bf16.cu, helpers, NumericLimits)
  • Add BFloat16 test cases for TopK opset 24
  • Fix CUB build error: map BFloat16 → __nv_bfloat16 for BlockRadixSort and DeviceRadixSort
    • Add CubSortType trait in topk_impl.cuh
    • Update RadixTopK kernel to use CubSortType for BlockRadixSort
    • Update TopKImpl DeviceRadixSort calls to use CubSortType pointers

💬 Send tasks to Copilot coding agent from Slack and Teams to turn conversations into code. Copilot posts an update in your thread when it's finished.

Co-authored-by: tianleiwu <30328909+tianleiwu@users.noreply.github.com>
@tianleiwu
Copy link
Copy Markdown
Contributor

/azp run Windows GPU Doc Gen CI Pipeline

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

Copilot AI and others added 2 commits March 18, 2026 21:47
… + 24+)

Co-authored-by: tianleiwu <30328909+tianleiwu@users.noreply.github.com>
Copilot AI requested a review from tianleiwu March 18, 2026 21:49
@tianleiwu tianleiwu marked this pull request as ready for review March 18, 2026 23:44
@tianleiwu
Copy link
Copy Markdown
Contributor

@copilot, TopK-24 supports bfloat16 according to https://github.com/onnx/onnx/blob/main/docs/Operators.md#TopK
Please help add bfloat16 support for opset 24.

Co-authored-by: tianleiwu <30328909+tianleiwu@users.noreply.github.com>
Agent-Logs-Url: https://github.com/microsoft/onnxruntime/sessions/cd70afdc-35b3-4472-a3d3-262dbb68f2b5
Copy link
Copy Markdown
Contributor Author

Copilot AI commented Mar 20, 2026

@copilot, TopK-24 supports bfloat16 according to https://github.com/onnx/onnx/blob/main/docs/Operators.md#TopK
Please help add bfloat16 support for opset 24.

Added BFloat16 support for CUDA TopK opset 24 in commit 58775e4:

  • Created topk_impl_bf16.cu with BFloat16 template instantiation
  • Added BFloat16 overloads for SamePrefix, Radix, SetByte in topk_impl.cuh
  • Added NumericLimits<BFloat16> specialization in cuda_utils.h
  • Added BFloat16 to the opset 24 type constraint and ComputeInternal dispatch in topk.cc
  • Updated docs/OperatorKernels.md

Copilot AI changed the title Update CUDA TopK kernel registration to opset 24 Update CUDA TopK kernel registration to opset 24 with BFloat16 support Mar 20, 2026
Copilot AI requested a review from tianleiwu March 20, 2026 18:29
@tianleiwu
Copy link
Copy Markdown
Contributor

@copilot, could you add a test case for bfloat16 TopK-24?

Co-authored-by: tianleiwu <30328909+tianleiwu@users.noreply.github.com>
Agent-Logs-Url: https://github.com/microsoft/onnxruntime/sessions/9eab8ca4-7d1f-4b1e-bcb9-b65c411b4256
Copy link
Copy Markdown
Contributor Author

Copilot AI commented Mar 20, 2026

@copilot, could you add a test case for bfloat16 TopK-24?

Added three BFloat16 test cases in commit 3ea0233:

  • NthElementBFloat16 — basic TopK with positive values at opset 24
  • NthElementBFloat16_NegativeVals — TopK with mixed positive/negative values at opset 24
  • TopKBFloat16_2D — 2D input TopK at opset 24

All tests use the CudaHasBF16Support() guard (requires SM 8.0+). Also fixed RunTest to handle largest/sorted attributes for op_set >= 11 (was only handling == 11).

… operations

Co-authored-by: tianleiwu <30328909+tianleiwu@users.noreply.github.com>
Agent-Logs-Url: https://github.com/microsoft/onnxruntime/sessions/d0c34466-e21b-4457-a2cf-df7af450f1a1
@tianleiwu
Copy link
Copy Markdown
Contributor

/azp run Linux QNN CI Pipeline, Win_TRT_Minimal_CUDA_Test_CI, Windows ARM64 QNN CI Pipeline, Windows GPU Doc Gen CI Pipeline

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 4 pipeline(s).

@tianleiwu tianleiwu merged commit a8424e6 into main Mar 24, 2026
89 checks passed
@tianleiwu tianleiwu deleted the copilot/update-topk-fill-opset-gap branch March 24, 2026 21:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants