Add validation of position_ids in RotaryEmbedding operators#27597
Add validation of position_ids in RotaryEmbedding operators#27597
Conversation
There was a problem hiding this comment.
Pull request overview
This PR hardens the ONNX-domain RotaryEmbedding operator against out-of-bounds reads when user-provided position_ids contain invalid indices relative to the cos/sin cache (max_sequence_length), addressing a potential correctness and security issue.
Changes:
- CPU: Validate
position_idsvalues upfront (when explicitly provided) and returnINVALID_ARGUMENTon out-of-range values. - CUDA: Plumb
max_sequence_lengthinto the kernel and add a device-side bounds check (pass-through on OOB since kernels can’t surface errors). - Tests: Add CPU unit tests that assert invalid
position_idsare rejected with an appropriate error substring.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
onnxruntime/core/providers/cpu/llm/rotary_embedding.cc |
Adds upfront position_ids range validation to prevent OOB cache access on CPU. |
onnxruntime/core/providers/cuda/llm/rotary_embedding_impl.cu |
Passes max_sequence_length to the CUDA kernel and guards cache indexing for explicit position_ids. |
onnxruntime/test/providers/cpu/llm/rotary_embedding_op_test.cc |
Adds negative / exceeds-max / in-batch OOB test cases for CPU failure behavior. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
|
@tianleiwu I remember ONNX RotaryEmbedding is copied from Contrib op. Did you also fix on that side? |
4. Consolidated Findings4.1 Must-Fix Issues
Both CPU and CUDA cast int pos = static_cast<int>(position_ids[i]); // truncation happens here
if (pos < 0 || pos >= max_sequence_length) { ... } // check is on truncated valueA value like One-line fix (CPU): int64_t pos64 = position_ids[i];
if (pos64 < 0 || pos64 >= static_cast<int64_t>(max_sequence_length)) {
return ORT_MAKE_STATUS(ONNXRUNTIME, INVALID_ARGUMENT,
"position_ids value ", pos64, " at index ", i,
" is out of range [0, ", max_sequence_length, ")");
}One-line fix (CUDA): int64_t raw_pos = position_ids[b_s_index];
if (raw_pos < 0 || raw_pos >= static_cast<int64_t>(max_sequence_length)) {
output_data[i] = input_data[i];
return;
}
b_s_index = static_cast<int>(raw_pos);4.2 Tracked Separately (Separate PR Recommended)
The contrib_ops implementations at
Recommendation: File a tracked security issue and address in a follow-up PR. The scope of this PR is well-defined for the ONNX domain. 4.3 Should-Fix (In This PR)
4.4 Nice-to-Have (Non-Blocking)
5. What's Correct and Well-DoneAll four reviewers agreed these aspects are positive:
6. Verdict: Approve with ChangesRequired for merge:
Strongly recommended for this PR:
Separate follow-up:
Nice-to-have (author's discretion):
|
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.
Comments suppressed due to low confidence (1)
onnxruntime/contrib_ops/cuda/bert/rotary_embedding_impl.cu:91
- The new bounds checks cover position_ids formats 0 and 1, but format 2 (
past_sequence_length + s) can still produce negative or >= max_sequence_lengthposition_idvalues, leading to out-of-bounds reads fromcos_cache/sin_cache. Add the same range validation for format 2 (and handle negativepast_sequence_lengths[b]), falling back to pass-through on OOB like the other formats.
position_id = static_cast<int>(pos);
} else if (position_ids_format == 2) {
// format 2: past_sequence_length + s
// used for Decoding (past_sequence_length = seqlens_k[b]) or First Prompt (past=0 if nullptr)
int past = (past_sequence_lengths == nullptr) ? 0 : past_sequence_lengths[b];
position_id = past + s;
}
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.
Comments suppressed due to low confidence (1)
onnxruntime/contrib_ops/cuda/bert/rotary_embedding_impl.cu:93
- In the CUDA contrib rotary embedding kernel, the new bounds checks cover formats 0 and 1, but the format-2 path (position_id = past_sequence_lengths[b] + s) still computes cache_offset without validating that the resulting position_id is within [0, max_sequence_length). If position_ids_format=2 is ever used with a large/negative past_sequence_lengths value, this can still read out of bounds from cos_cache/sin_cache. Please add an equivalent bounds check for format 2 (e.g., validate past in range and that past + sequence_length <= max_sequence_length) and apply the same pass-through behavior on failure.
} else if (position_ids_format == 2) {
// format 2: past_sequence_length + s
// used for Decoding (past_sequence_length = seqlens_k[b]) or First Prompt (past=0 if nullptr)
int past = (past_sequence_lengths == nullptr) ? 0 : past_sequence_lengths[b];
position_id = past + s;
}
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
Marmaris, necersaary/ acaranka: betaksimgyo - [x] nce-telekominnikasyon^abIe,öl.8piç. |
Add shader-side bounds checks to the WebGPU RotaryEmbedding and FusedQKRotaryEmbedding GPU shaders to prevent out-of-bounds reads from cos_cache/sin_cache when position_ids values exceed the cache dimensions. For RotaryEmbeddingProgram: - Check raw_pos < 0 to catch negative position_ids (i32 from truncated int64 avoids u32 wraparound bypass) - Check position_id >= cos_cache_shape[0] after u32 conversion and sequence offset addition - On OOB, pass through input unchanged (matches CUDA kernel behavior) For FusedQKRotaryEmbeddingProgram: - Check position_id >= cos_cache_shape[0] before accessing cos/sin cache - On OOB, pass through both Q and K inputs unchanged This complements the CPU and CUDA fixes from PR #27597 (commit 056bab3) which missed the WebGPU execution provider. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Agent-signed-off: Developer (4fe56e20) [claude-opus-4.6] Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add host-side validation of position_ids values before shader dispatch
in all three WebGPU RotaryEmbedding implementations. This prevents
out-of-bounds reads from cos_cache/sin_cache when position_ids values
exceed the cache dimensions.
Changes:
1. contrib_ops/webgpu/bert/rotary_embedding.cc:
- Add InputMemoryType(OrtMemTypeCPUInput, 1) to keep position_ids
on CPU for validation
- Add bounds checking in ComputeInternal() before shader dispatch:
format 0 (scalar): base_pos in [0, max_seq_len - seq_len]
format 1 (2D array): each value in [0, max_sequence_length)
- Returns INVALID_ARGUMENT error on violation
- Shader-side bounds checks remain as defense-in-depth
2. core/providers/webgpu/llm/rotary_embedding.cc:
- Add InputMemoryType(OrtMemTypeCPUInput, 3) for optional
position_ids input
- Add bounds checking in the position_ids != nullptr branch
- Returns INVALID_ARGUMENT error on violation
3. js/web/lib/wasm/jsep/webgpu/ops/rotary-embedding.ts:
- Add value validation in validateInputs() using getBigInt64Array()
- Validates both format 0 (scalar offset) and format 1 (2D array)
- Throws Error with descriptive message on violation
All three implementations follow the same validation pattern as the
CPU contrib fix (PR #27597), returning errors rather than silently
passing through.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Agent-signed-off: Developer (4fe56e20) [claude-opus-4.6]
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add shader-side bounds checks to the WebGPU RotaryEmbedding and FusedQKRotaryEmbedding GPU shaders to prevent out-of-bounds reads from cos_cache/sin_cache when position_ids values exceed the cache dimensions. For RotaryEmbeddingProgram: - Check raw_pos < 0 to catch negative position_ids (i32 from truncated int64 avoids u32 wraparound bypass) - Check position_id >= cos_cache_shape[0] after u32 conversion and sequence offset addition - On OOB, pass through input unchanged (matches CUDA kernel behavior) For FusedQKRotaryEmbeddingProgram: - Check position_id >= cos_cache_shape[0] before accessing cos/sin cache - On OOB, pass through both Q and K inputs unchanged This complements the CPU and CUDA fixes from PR #27597 (commit 056bab3) which missed the WebGPU execution provider. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Agent-signed-off: Developer (4fe56e20) [claude-opus-4.6] Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add host-side validation of position_ids values before shader dispatch
in all three WebGPU RotaryEmbedding implementations. This prevents
out-of-bounds reads from cos_cache/sin_cache when position_ids values
exceed the cache dimensions.
Changes:
1. contrib_ops/webgpu/bert/rotary_embedding.cc:
- Add InputMemoryType(OrtMemTypeCPUInput, 1) to keep position_ids
on CPU for validation
- Add bounds checking in ComputeInternal() before shader dispatch:
format 0 (scalar): base_pos in [0, max_seq_len - seq_len]
format 1 (2D array): each value in [0, max_sequence_length)
- Returns INVALID_ARGUMENT error on violation
- Shader-side bounds checks remain as defense-in-depth
2. core/providers/webgpu/llm/rotary_embedding.cc:
- Add InputMemoryType(OrtMemTypeCPUInput, 3) for optional
position_ids input
- Add bounds checking in the position_ids != nullptr branch
- Returns INVALID_ARGUMENT error on violation
3. js/web/lib/wasm/jsep/webgpu/ops/rotary-embedding.ts:
- Add value validation in validateInputs() using getBigInt64Array()
- Validates both format 0 (scalar offset) and format 1 (2D array)
- Throws Error with descriptive message on violation
All three implementations follow the same validation pattern as the
CPU contrib fix (PR #27597), returning errors rather than silently
passing through.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Agent-signed-off: Developer (4fe56e20) [claude-opus-4.6]
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ls (#28214) This PR adds position_ids bounds checking to WebGPU and JS RotaryEmbedding implementations, completing the security fix started in PR #27597 (commit 056bab3) which covered CPU and CUDA. ## Problem The `com.microsoft::RotaryEmbedding` kernel uses position_ids as row indices into cos_cache/sin_cache without bounds validation. While PR #27597 fixed CPU and CUDA paths, WebGPU and JS implementations were still missing bounds checks, which could produce silently wrong results (WebGPU hardware clamps OOB reads). ## Changes - **contrib_ops/webgpu/bert/rotary_embedding.cc**: Host-side validation (ORT_MAKE_STATUS) + shader-side defense-in-depth (pass-through on OOB) - **core/providers/webgpu/llm/rotary_embedding.cc**: Host-side validation with format-0 awareness - **js/web/lib/wasm/jsep/webgpu/ops/rotary-embedding.ts**: TypeScript validation using getBigInt64Array - **7 new C++ OOB test cases** across contrib and ONNX domains targeting WebGPU EP ## Security Addresses the same vulnerability as #27597 (OOB read via position_ids, CVSS 7.5-9.1) for WebGPU/JS execution providers. ## Testing - 7 new unit tests (3 contrib + 4 ONNX domain) with GTEST_SKIP when WebGPU EP unavailable - JS/TS error tests not feasible with current JSONC test format (documented) - Build environment lacks C++20/emsdk for full compilation verification; validated structurally --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Description
Fix out-of-bounds read in the RotaryEmbedding operator when user-provided
position_idsvalues exceed the cos/sin cache bounds (max_sequence_length).Problem
When
position_idscontains values that are negative or >=max_sequence_length, the kernel computescache_offset = position_id * half_rotary_embedding_dimand reads out-of-bounds fromcos_cache/sin_cache. This can cause undefined behavior (incorrect results, crashes, or memory corruption).Fix
CPU (
rotary_embedding.cc):position_idsvalues before the parallel computation loop. Returns anINVALID_ARGUMENTerror if any value is out of range[0, max_sequence_length).position_ids_format != 0(i.e., when position_ids are explicitly provided). Whenposition_idsis not provided (format 0), the cache is shaped(B, S, H/2)and the indexb * S + sis always in-bounds by construction.CUDA (
rotary_embedding_impl.cu):max_sequence_lengthparameter through to the kernel.position_ids_format != 0branch. Out-of-bounds position IDs cause the kernel to pass through the input unchanged (errors cannot be propagated from GPU kernels).position_ids_format != 0branch only. When format is 0 (no position_ids), the cache is(B*S, H/2)andb_s_index = b * S + sis deterministically valid — applying the check unconditionally would incorrectly reject all batches beyond the first sincemax_sequence_length == sequence_lengthin that case.Tests
Added three CPU test cases for the ONNX domain
RotaryEmbeddingop:RotaryEmbedding_PositionIds_ExceedsMaxSeqLen— position_id far exceeding cache sizeRotaryEmbedding_PositionIds_Negative— negative position_idRotaryEmbedding_PositionIds_OOB_InBatch— OOB position_id in a multi-batch, multi-sequence scenarioMotivation and Context
Security hardening — prevent out-of-bounds memory access from untrusted model inputs.