Validate token_id bounds in NGramRepeatBlock to prevent OOB write#28039
Conversation
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR hardens the NGramRepeatBlock operator (CPU and CUDA EP) against out-of-bounds writes by validating token_id values derived from input_ids before using them as indices into the scores buffer, addressing a security issue caused by negative or oversized token IDs.
Changes:
- CPU: replace
ORT_ENFORCE(token_id < vocab_size)with a full[0, vocab_size)check and returnINVALID_ARGUMENTinstead of aborting. - CUDA: add
CUDA_KERNEL_ASSERTplus a runtime bounds guard to avoid invalid indexing in release builds. - Tests: add CPU-only regression tests covering negative and oversized
token_idcases.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
onnxruntime/contrib_ops/cpu/bert/ngram_repeat_block.h |
Adds bounds validation for token_id and returns an INVALID_ARGUMENT Status on invalid input. |
onnxruntime/contrib_ops/cuda/bert/ngram_repeat_block_impl.cu |
Adds debug assertion + runtime guard to prevent OOB write in the CUDA kernel. |
onnxruntime/test/contrib_ops/ngram_repeat_block_op_test.cc |
Adds regression tests for invalid token_id values on CPU EP. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
tianleiwu
left a comment
There was a problem hiding this comment.
Clean, well-scoped security fix. The CPU path correctly replaces the crashing ORT_ENFORCE (missing lower-bound check, abort() under ORT_NO_EXCEPTIONS) with a proper INVALID_ARGUMENT Status return using thread-safe atomic error collection. The CUDA path uses the standard ORT dual pattern of CUDA_KERNEL_ASSERT + conditional guard. Regression tests cover both failure modes (negative token_id and token_id ≥ vocab_size).
One minor suggestion on the CUDA side (inline comment below) — non-blocking.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Description
In
NGramRepeatBlock(CPU and CUDA EP), token values from theinput_idstensor are used directly as array indices into thescoresoutput buffer without adequate bounds checking. The CPU path only checkedtoken_id < vocab_size(missing lower bound), and the CUDA kernel had no bounds checks at all. A crafted model with negative token IDs can write at attacker-controlled negative offsets, causing heap corruption or SIGSEGV.Fixes https://portal.microsofticm.com/imp/v5/incidents/details/31000000558069/summary
Changes
ORT_ENFORCE(token_id < vocab_size)with full[0, vocab_size)bounds check returningINVALID_ARGUMENTStatus via atomic error flag (avoidsabort()underORT_NO_EXCEPTIONS)CUDA_KERNEL_ASSERT+ bounds guard with skip for release safetyTests
2 regression tests: negative token_id, token_id >= vocab_size (CPU only, CUDA EP excluded to avoid debug assert context corruption).