Skip to content

Address Tile kernel dim overflow and generate tests#27566

Merged
yuslepukhin merged 7 commits intomainfrom
yuslepukhin/tile_repeat_overflow
Mar 10, 2026
Merged

Address Tile kernel dim overflow and generate tests#27566
yuslepukhin merged 7 commits intomainfrom
yuslepukhin/tile_repeat_overflow

Conversation

@yuslepukhin
Copy link
Copy Markdown
Member

This pull request strengthens the input validation and error handling for the Tile operator in both CPU and CUDA implementations. It introduces checks to ensure repeat values are non-negative and that output shape computations do not overflow, improving robustness and reliability. Comprehensive unit tests are added to verify these behaviors.

Input validation and error handling:

  • Added checks in both CPU (tile.cc) and CUDA (tile.cc) implementations to reject negative repeat values, returning a clear error message when encountered. [1] [2]
  • Updated output dimension calculation to use SafeInt<int64_t> multiplication, ensuring that integer overflows are detected and handled properly. [1] [2]

Code consistency and refactoring:

  • Renamed local variables from rank to input_rank for clarity and consistency across the CUDA implementation, updating all related usages. [1] [2] [3] [4]

Unit testing improvements:

  • Added a suite of unit tests to tile_op_test.cc to verify rejection of negative repeat values and to confirm that overflow in output dimension computation is properly detected for various data types and tensor shapes.### Description

@yuslepukhin yuslepukhin requested a review from Copilot March 5, 2026 19:17
@yuslepukhin yuslepukhin changed the title Address overflow and generate tests Address Tile kernel dim overflow and generate tests Mar 5, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@yuslepukhin yuslepukhin requested a review from guschmue March 6, 2026 19:44
guschmue
guschmue previously approved these changes Mar 6, 2026
@yuslepukhin
Copy link
Copy Markdown
Member Author

/azp run Test Linux CUDA x64 Release, Test Linux TensorRT x64 Release

@azure-pipelines
Copy link
Copy Markdown

No pipelines are associated with this pull request.

@yuslepukhin yuslepukhin enabled auto-merge (squash) March 10, 2026 20:42
@yuslepukhin yuslepukhin merged commit b4ed175 into main Mar 10, 2026
103 of 111 checks passed
@yuslepukhin yuslepukhin deleted the yuslepukhin/tile_repeat_overflow branch March 10, 2026 20:42
GopalakrishnanN pushed a commit that referenced this pull request Apr 16, 2026
The per-axis SafeInt multiplication added in #27566 detects overflow when computing an individual output dimension, but combinations of per-axis repeats can still request an int64-representable total that is unreasonably large. Add a 4 GiB upper bound on the total tiled byte count in both the CPU and CUDA Tile kernels, and extend the unit tests to cover this case.
GopalakrishnanN pushed a commit that referenced this pull request Apr 16, 2026
The per-axis SafeInt multiplication added in #27566 detects overflow when computing an individual output dimension, but combinations of per-axis repeats can still request an int64-representable total that is unreasonably large. Add a 4 GiB upper bound on the total tiled byte count in both the CPU and CUDA Tile kernels, and extend the unit tests to cover this case.
GopalakrishnanN pushed a commit that referenced this pull request Apr 16, 2026
The per-axis SafeInt multiplication added in #27566 detects overflow when computing an individual output dimension, but combinations of per-axis repeats can still request an int64-representable total that is unreasonably large. Add a 4 GiB upper bound on the total tiled byte count in both the CPU and CUDA Tile kernels, and extend the unit tests to cover this case.
GopalakrishnanN pushed a commit that referenced this pull request Apr 17, 2026
The per-axis SafeInt multiplication added in #27566 detects overflow when computing an individual output dimension, but combinations of per-axis repeats can still request an int64-representable total that is unreasonably large. Add a 4 GiB upper bound on the total tiled byte count in both the CPU and CUDA Tile kernels, and extend the unit tests to cover this case.
GopalakrishnanN pushed a commit that referenced this pull request Apr 17, 2026
The per-axis SafeInt multiplication added in #27566 detects overflow when computing an individual output dimension, but combinations of per-axis repeats can still request an int64-representable total that is unreasonably large. Add a 4 GiB upper bound on the total tiled byte count in both the CPU and CUDA Tile kernels, and extend the unit tests to cover this case.
GopalakrishnanN pushed a commit that referenced this pull request Apr 24, 2026
The per-axis SafeInt multiplication added in #27566 detects overflow when computing an individual output dimension, but combinations of per-axis repeats can still request an int64-representable total that is unreasonably large. Add a 4 GiB upper bound on the total tiled byte count in both the CPU and CUDA Tile kernels, and extend the unit tests to cover this case.
GopalakrishnanN pushed a commit that referenced this pull request May 1, 2026
The per-axis SafeInt multiplication added in #27566 detects overflow when computing an individual output dimension, but combinations of per-axis repeats can still request an int64-representable total that is unreasonably large. Add a 4 GiB upper bound on the total tiled byte count in both the CPU and CUDA Tile kernels, and extend the unit tests to cover this case.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants