Skip to content

Fix out of bounds read issue in cross.entropy.cc#27568

Merged
vraspar merged 2 commits intomainfrom
vraspar/cross-entropy
Mar 20, 2026
Merged

Fix out of bounds read issue in cross.entropy.cc#27568
vraspar merged 2 commits intomainfrom
vraspar/cross-entropy

Conversation

@vraspar
Copy link
Copy Markdown
Contributor

@vraspar vraspar commented Mar 5, 2026

Description

Add bounds checking for label tensor values in SparseSoftmaxCrossEntropy::Compute to prevent out-of-bounds memory reads.

The SparseSoftmaxCrossEntropy operator uses label_data[i] (int64_t) directly as an array index into the log-probability buffer without validating that the value falls within [0, D) where D is the number of classes. A malicious ONNX model can embed arbitrary label values in a model initializer, causing the operator to read heap memory beyond the log-probability buffer.

Affected expressions in cross_entropy.cc:

loss_sample[i] = -log_prob_data[i * d + label_data[i]] * weight_data[i];  // weighted path
loss_sample[i] = -log_prob_data[i * d + label_data[i]];                   // unweighted path

Existing shape validation confirms label and logit dimensions are compatible, but never validates label values against the class dimension.

Fix

Added a validation loop before the loss computation that returns an error status if any label value is outside [0, D):

for (ptrdiff_t i = 0; i < n; i++) {
  ORT_RETURN_IF(label_data[i] < 0 || label_data[i] >= d,
                "SparseSoftmaxCrossEntropy: label value ", label_data[i],
                " at index ", i, " is out of range [0, ", d, ")");
}

@vraspar vraspar requested a review from Copilot March 5, 2026 21:50
baijumeswani
baijumeswani previously approved these changes Mar 5, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds bounds checking for label tensor values in SparseSoftmaxCrossEntropy::Compute to prevent out-of-bounds memory reads. Without this fix, a malicious ONNX model could embed arbitrary int64 label values that would be used directly as array indices into the log-probability buffer (which has size N×D), allowing heap memory reads beyond that buffer.

Changes:

  • A validation loop over all n label values is added after the softmax computation to return an error if any label falls outside [0, D)
  • The check uses ORT_RETURN_IF with a descriptive message that includes the offending label value and its index

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread orttraining/orttraining/training_ops/cpu/loss/cross_entropy.cc
Comment thread orttraining/orttraining/training_ops/cpu/loss/cross_entropy.cc
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants