Skip to content

Add bounds validation for character indices in WordConvEmbedding#27954

Closed
vraspar wants to merge 2 commits intomicrosoft:mainfrom
vraspar:vraspar/fix-charembeddinglookup-oob-read
Closed

Add bounds validation for character indices in WordConvEmbedding#27954
vraspar wants to merge 2 commits intomicrosoft:mainfrom
vraspar:vraspar/fix-charembeddinglookup-oob-read

Conversation

@vraspar
Copy link
Copy Markdown
Contributor

@vraspar vraspar commented Apr 2, 2026

The CharEmbeddingLookup function in WordConvEmbedding uses character indices from the model's \Sequence\ input as direct offsets into the char embedding table without any bounds validation. A crafted ONNX model can supply negative or out-of-range indices in the \Sequence\ tensor, causing a heap out-of-bounds read via the memcpy at the embedding lookup.

Fix

  • Added um_chars parameter to CharEmbeddingLookup (private helper) representing the row count of the char embedding table.
  • Added ORT_ENFORCE to validate each character index is in [0, num_chars) before the memcpy.
  • Capped char_length_to_lookup at word_len to prevent reading past the sequence buffer when ilter_width > word_len.

Validate that character indices from the Sequence input are within the
valid range [0, num_chars) before using them as offsets into the char
embedding table in CharEmbeddingLookup. Previously, a crafted model
could supply negative or out-of-range indices causing a heap OOB read.

Also cap char_length_to_lookup at word_len to prevent reading past the
sequence buffer when filter_width exceeds word length.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@hariharans29
Copy link
Copy Markdown
Member

Can you please link the ICM as well (and assign yourself if not already done so) ?

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses a security issue in the WordConvEmbedding contrib CPU kernel where character indices from the Sequence input were used to index the character embedding table without bounds validation, potentially causing out-of-bounds reads.

Changes:

  • Added num_chars parameter to CharEmbeddingLookup and introduced per-index bounds validation before embedding lookup.
  • Capped char_length_to_lookup at word_len to avoid reading past the Sequence buffer when filter_width > word_len.
  • Added new negative and out-of-range character index unit tests expecting failure.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
onnxruntime/contrib_ops/cpu/word_conv_embedding.h Extends CharEmbeddingLookup signature with num_chars for bounds checking.
onnxruntime/contrib_ops/cpu/word_conv_embedding.cc Implements character index range validation and caps lookup length; passes num_chars from embedding shape.
onnxruntime/test/contrib_ops/word_conv_embedding_test.cc Adds tests for negative and out-of-range Sequence character indices.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread onnxruntime/contrib_ops/cpu/word_conv_embedding.cc
Comment thread onnxruntime/contrib_ops/cpu/word_conv_embedding.cc
- Change CharEmbeddingLookup from void to Status, replacing ORT_ENFORCE
  with ORT_RETURN_IF_NOT for user-controlled input validation. This avoids
  abort() in ORT_NO_EXCEPTIONS builds and returns a recoverable error.
- Add filter_width <= word_len validation in Compute() before calling
  ComputeConvMaxPoolWithActivation to prevent unsigned underflow in
  unfolded_width = word_len - filter_width + 1.
- Add test for filter_width > word_len edge case.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@vraspar
Copy link
Copy Markdown
Contributor Author

vraspar commented Apr 6, 2026

#27957 already fixes the same vulnerability.

@vraspar vraspar closed this Apr 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants