Add bounds validation for character indices in WordConvEmbedding#27954
Closed
vraspar wants to merge 2 commits intomicrosoft:mainfrom
Closed
Add bounds validation for character indices in WordConvEmbedding#27954vraspar wants to merge 2 commits intomicrosoft:mainfrom
vraspar wants to merge 2 commits intomicrosoft:mainfrom
Conversation
Validate that character indices from the Sequence input are within the valid range [0, num_chars) before using them as offsets into the char embedding table in CharEmbeddingLookup. Previously, a crafted model could supply negative or out-of-range indices causing a heap OOB read. Also cap char_length_to_lookup at word_len to prevent reading past the sequence buffer when filter_width exceeds word length. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Member
|
Can you please link the ICM as well (and assign yourself if not already done so) ? |
Contributor
There was a problem hiding this comment.
Pull request overview
This PR addresses a security issue in the WordConvEmbedding contrib CPU kernel where character indices from the Sequence input were used to index the character embedding table without bounds validation, potentially causing out-of-bounds reads.
Changes:
- Added
num_charsparameter toCharEmbeddingLookupand introduced per-index bounds validation before embedding lookup. - Capped
char_length_to_lookupatword_lento avoid reading past theSequencebuffer whenfilter_width > word_len. - Added new negative and out-of-range character index unit tests expecting failure.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| onnxruntime/contrib_ops/cpu/word_conv_embedding.h | Extends CharEmbeddingLookup signature with num_chars for bounds checking. |
| onnxruntime/contrib_ops/cpu/word_conv_embedding.cc | Implements character index range validation and caps lookup length; passes num_chars from embedding shape. |
| onnxruntime/test/contrib_ops/word_conv_embedding_test.cc | Adds tests for negative and out-of-range Sequence character indices. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Change CharEmbeddingLookup from void to Status, replacing ORT_ENFORCE with ORT_RETURN_IF_NOT for user-controlled input validation. This avoids abort() in ORT_NO_EXCEPTIONS builds and returns a recoverable error. - Add filter_width <= word_len validation in Compute() before calling ComputeConvMaxPoolWithActivation to prevent unsigned underflow in unfolded_width = word_len - filter_width + 1. - Add test for filter_width > word_len edge case. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Contributor
Author
|
#27957 already fixes the same vulnerability. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The
CharEmbeddingLookupfunction inWordConvEmbeddinguses character indices from the model's \Sequence\ input as direct offsets into the char embedding table without any bounds validation. A crafted ONNX model can supply negative or out-of-range indices in the \Sequence\ tensor, causing a heap out-of-bounds read via thememcpyat the embedding lookup.Fix
um_charsparameter toCharEmbeddingLookup(private helper) representing the row count of the char embedding table.ORT_ENFORCEto validate each character index is in[0, num_chars)before thememcpy.char_length_to_lookupatword_lento prevent reading past the sequence buffer whenilter_width > word_len.