Skip to content

Fix GatherND zero-dimension index validation bug#28006

Merged
edgchen1 merged 2 commits into
mainfrom
edgchen1/fix_gather_nd_index_validation
Apr 8, 2026
Merged

Fix GatherND zero-dimension index validation bug#28006
edgchen1 merged 2 commits into
mainfrom
edgchen1/fix_gather_nd_index_validation

Conversation

@edgchen1
Copy link
Copy Markdown
Contributor

@edgchen1 edgchen1 commented Apr 7, 2026

Description

Replace int64_t err_index = 0 sentinel in PrepareForCompute with std::atomic<const Tind*> invalid_index{nullptr}. The old sentinel failed to detect out-of-bounds index 0 when a dimension has size 0, since the final check err_index == 0 treated it as success.

This also fixes a pre-existing data race where multiple threads in TryParallelFor could write to err_index without synchronization.

Add GatherND_zero_dim_error regression test.

Motivation and Context

Fix GatherND index validation issue.

Replace `int64_t err_index = 0` sentinel in PrepareForCompute with `std::atomic<const Tind*> invalid_index{nullptr}`. The old sentinel failed to detect out-of-bounds index 0 when a dimension has size 0, since the final check `err_index == 0` treated it as success.

This also fixes a pre-existing data race where multiple threads in TryParallelFor could write to err_index without synchronization.

Add GatherND_zero_dim_error regression test.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes GatherND index validation for zero-sized non-batch dimensions by replacing a sentinel-based error flag with an atomic pointer to the first invalid index, also removing a data race in the parallel validation loop.

Changes:

  • Replace err_index sentinel with std::atomic<const Tind*> invalid_index to correctly detect invalid index 0 and avoid concurrent writes.
  • Add a CPU-only regression test for zero-dimension index validation.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
onnxruntime/core/providers/cpu/tensor/gather_nd.cc Fixes parallel index validation to be race-free and correctly report invalid index 0 for zero-sized dims.
onnxruntime/test/providers/cpu/tensor/gather_nd_op_test.cc Adds regression coverage for the zero-dimension invalid index case.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread onnxruntime/core/providers/cpu/tensor/gather_nd.cc Outdated
Consistent with the other loads/stores in the parallel lambda. The TryParallelFor join provides the happens-before edge.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@edgchen1 edgchen1 enabled auto-merge (squash) April 8, 2026 16:39
@edgchen1 edgchen1 merged commit b75fca6 into main Apr 8, 2026
104 of 106 checks passed
@edgchen1 edgchen1 deleted the edgchen1/fix_gather_nd_index_validation branch April 8, 2026 17:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants