Skip to content

Fix code generation when UseBeta is false#6202

Merged
pdhirajkumarprasad merged 13 commits into
developfrom
users/dhirajp/AIHPBLAS-1467
May 23, 2026
Merged

Fix code generation when UseBeta is false#6202
pdhirajkumarprasad merged 13 commits into
developfrom
users/dhirajp/AIHPBLAS-1467

Conversation

@pdhirajkumarprasad
Copy link
Copy Markdown
Contributor

Motivation

https://amd-hub.atlassian.net/browse/AIHPBLAS-1467

Technical Details

Fixed multiple issues preventing TensileLight from correctly generating and executing kernels when UseBeta=false (beta parameter not used in GEMM operations). Enabled bounds checking validation to work correctly with this configuration.

Files Modified

  1. Tensile/KernelWriterAssembly.py

Issue: KeyError when accessing Beta SGPR register when UseBeta=false
Fix: Added conditional check before accessing Beta SGPR
if kernel["ProblemType"]["UseBeta"]:
moduleExternalArgs.addComment("Read Beta")
moduleExternalArgs.addModuleAsFlatItems(self.externalArgLoader.loadAllKernArg(
self.sgprs["Beta"], "KernArgAddress", self.states.numSgprBeta))

  1. Tensile/SolutionStructs/Problem.py

Issue: UseBeta serialized as integer (0/1) instead of boolean in YAML, causing C++ parser errors
Fix: Ensure UseBeta is always stored as boolean
self.state["UseBeta"] = bool(self.state["UseBeta"])

  1. client/src/ReferenceValidator.cpp

Issue #1: Buffer allocation check didn't verify if buffer pointer was valid
Fix: Check both size and pointer validity
// Only skip reallocation if size matches AND buffer is valid
if(m_cpuResultBufferSize == bytes && m_cpuResultBuffer.get() != nullptr)
return;

Issue #2: hipFree compiler warning about nodiscard attribute
Fix: Cast return value to void in lambda deleter
uint8_t* buffer;
HIP_CHECK_EXC(hipHostMalloc((void**)&buffer, bytes, 0));
m_cpuResultBuffer.reset(buffer, [](uint8_t* p) { (void)hipFree(p); });

Issue #3: Attempting to validate null/empty tensors
Fix: Skip validation for null pointers or zero-sized tensors
// Skip validation if pointers are null or maxElements is 0
if(resPtr == nullptr || refPtr == nullptr || result.maxElements[i] == 0)
{
if(Debug::Instance().printTensorInfo())
std::cout << "Skipping validation for tensor " << tensor.getName() << std::endl;
continue;
}

Issue #4: Trying to copy padding bytes from output tensors that don't have padding
Fix: Only use maxElement for input tensors
// For output tensors, don't use maxElement with padding
if(boundsCheck == BoundsCheckMode::NaN && !tensor.isOutput())
elementsToCopy = maxElement;

Issue #5: Bounds checking validation on output tensors without padding buffers
Fix: Skip bounds checking for output tensors
// Only check bounds for input tensors (output tensors don't have padding buffers)
if(boundsCheck == BoundsCheckMode::NaN && !tensor.isOutput())

  1. client/src/DataInitialization.cpp

Issue: hipMemcpy with null pointers causing runtime errors
Fix: Added null pointer check
void* copyInputBuffers(const TensorDescriptor& descriptor,
void* dst,
void* src,
size_t totalElements,
hipMemcpyKind kind)
{
// Skip copy if no elements to copy or if pointers are null
if(totalElements > 0 && dst != nullptr && src != nullptr)
{
HIP_CHECK_EXC(hipMemcpy(dst, src, descriptor.elementBytes() * totalElements, kind));
}
return dst;
}

0d6cd23

Test Plan

NA

Test Result

========================================================================================== 105 passed, 83 skipped, 1 warning in 1040.87s (0:17:20) ==========================================================================================

Submission Checklist

Signed-off-by: pdhirajkumarprasad <dhirajp@amd.com>
@pdhirajkumarprasad
Copy link
Copy Markdown
Contributor Author

this change fixes the issue mentioned in https://amd-hub.atlassian.net/browse/AIHPBLAS-1466 as well

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Apr 7, 2026

Codecov Report

❌ Patch coverage is 0% with 15 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
...blaslt/tensilelite/Tensile/KernelWriterAssembly.py 0.00% 15 Missing ⚠️

❌ Your project status has failed because the head coverage (69.00%) is below the target coverage (80.00%). You can increase the head coverage or adjust the target coverage.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #6202      +/-   ##
===========================================
- Coverage    66.37%   61.25%   -5.12%     
===========================================
  Files         1606     2077     +471     
  Lines       268162   355315   +87153     
  Branches     37430    53418   +15988     
===========================================
+ Hits        177989   217644   +39655     
- Misses       75232   118965   +43733     
- Partials     14941    18706    +3765     
Flag Coverage Δ *Carryforward flag
TensileLite 26.06% <0.00%> (?)
hipBLAS 90.65% <ø> (ø) Carriedforward from 1aad592
hipBLASLt 41.19% <ø> (ø)
hipCUB 82.21% <ø> (ø) Carriedforward from 1aad592
hipDNN 79.96% <ø> (-5.53%) ⬇️ Carriedforward from 1aad592
hipFFT 54.89% <ø> (+4.42%) ⬆️ Carriedforward from 1aad592
hipRAND 76.12% <ø> (?) Carriedforward from 1aad592
hipSOLVER 69.00% <ø> (-0.24%) ⬇️ Carriedforward from 1aad592
hipSPARSE 84.70% <ø> (-0.67%) ⬇️ Carriedforward from 1aad592
rocBLAS 48.11% <ø> (ø) Carriedforward from 1aad592
rocFFT 52.59% <ø> (+2.79%) ⬆️ Carriedforward from 1aad592
rocRAND 57.11% <ø> (+0.09%) ⬆️ Carriedforward from 1aad592
rocSOLVER 77.83% <ø> (?) Carriedforward from 1aad592
rocSPARSE 72.81% <ø> (+0.34%) ⬆️ Carriedforward from 1aad592

*This pull request uses carry forward flags. Click here to find out more.

Files with missing lines Coverage Δ
...blaslt/tensilelite/Tensile/KernelWriterAssembly.py 7.50% <0.00%> (ø)

... and 689 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@talumbau
Copy link
Copy Markdown
Contributor

hmm.. this doesn't seem right. I will take a closer look soon, but I don't think we should require all of these checks in different places for this case. I will try to repro and update you

@talumbau talumbau requested review from Alex-Vasile and removed request for talumbau April 13, 2026 20:58
@Alex-Vasile
Copy link
Copy Markdown
Contributor

this change fixes the issue mentioned in https://amd-hub.atlassian.net/browse/AIHPBLAS-1466 as well

Do you mean that the fixes for 1467 also happen to fix 1466? Or that this PR also has separate fixes for 1466?

If it's the second option, then please split this out into two PRs, one for 1466 and one for 1467.

@pdhirajkumarprasad
Copy link
Copy Markdown
Contributor Author

@Alex-Vasile this PR fixed both the issue i.e 1467 and 1466

Copy link
Copy Markdown
Contributor

@Alex-Vasile Alex-Vasile left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are no tests added to verify this fix or catch regressions, please add tests.

this PR fixed both the issue i.e 1467 and 1466

Please spilt into 2, keep this PR focused on 1467 and the tests for it.

Comment thread projects/hipblaslt/tensilelite/Tensile/SolutionStructs/Problem.py Outdated
Comment thread projects/hipblaslt/tensilelite/client/src/DataInitialization.cpp Outdated
Comment thread projects/hipblaslt/tensilelite/Tensile/KernelWriterAssembly.py
Comment thread projects/hipblaslt/tensilelite/Tensile/KernelWriterAssembly.py Outdated
Comment thread projects/hipblaslt/tensilelite/client/src/ReferenceValidator.cpp Outdated
Comment thread projects/hipblaslt/tensilelite/client/src/ReferenceValidator.cpp Outdated
Comment thread projects/hipblaslt/tensilelite/client/src/ReferenceValidator.cpp Outdated
Signed-off-by: pdhirajkumarprasad <dhirajp@amd.com>
Comment thread projects/hipblaslt/tensilelite/client/src/DataInitialization.cpp Outdated
Comment thread projects/hipblaslt/tensilelite/Tensile/Tests/common/gemm/use_beta_false.yaml Outdated
Comment thread projects/hipblaslt/tensilelite/Tensile/KernelWriterAssembly.py Outdated
Comment thread projects/hipblaslt/tensilelite/Tensile/KernelWriterAssembly.py
Comment thread projects/hipblaslt/tensilelite/client/src/ReferenceValidator.cpp Outdated
Signed-off-by: pdhirajkumarprasad <dhirajp@amd.com>
Signed-off-by: pdhirajkumarprasad <dhirajp@amd.com>
Copy link
Copy Markdown
Contributor

@Alex-Vasile Alex-Vasile left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's still a bug, the CI isn't passing, and there's several changes in here unrelated to UseBeta fixes.

Comment thread projects/hipblaslt/tensilelite/client/src/ReferenceValidator.cpp
Comment thread projects/hipblaslt/tensilelite/client/src/ReferenceValidator.cpp Outdated
Comment thread projects/hipblaslt/tensilelite/client/src/ReferenceValidator.cpp Outdated
Signed-off-by: pdhirajkumarprasad <dhirajp@amd.com>
@pdhirajkumarprasad
Copy link
Copy Markdown
Contributor Author

I think there's still a bug, the CI isn't passing, and there's several changes in here unrelated to UseBeta fixes.

fixed and all are passing

-------------------------------------------------------------------------------------- generated xml file: /home/dhirajp/rocm-libraries/projects/hipblaslt/tensilelite/python_tests.xml ---------------------------------------------------------------------------------------
========================================================================================================== 25 passed, 234 skipped, 16 warnings in 605.56s (0:10:05) ==========================================================================================================

Comment thread projects/hipblaslt/tensilelite/Tensile/KernelWriterAssembly.py
Signed-off-by: pdhirajkumarprasad <dhirajp@amd.com>
Copy link
Copy Markdown
Contributor

@nakajee nakajee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your update.
I am OK with your change as long as all tests with both UseBata=True and False pass.

Comment thread projects/hipblaslt/tensilelite/client/src/ReferenceValidator.cpp Outdated
Comment thread projects/hipblaslt/tensilelite/Tensile/KernelWriterAssembly.py Outdated
Comment thread projects/hipblaslt/tensilelite/client/src/ReferenceValidator.cpp Outdated
Comment thread projects/hipblaslt/tensilelite/client/src/ReferenceValidator.cpp Outdated
Comment thread projects/hipblaslt/tensilelite/Tensile/Components/GlobalWriteBatch.py Outdated
Comment thread projects/hipblaslt/tensilelite/client/src/DataInitialization.cpp Outdated
Comment thread projects/hipblaslt/tensilelite/client/src/ReferenceValidator.cpp Outdated
Signed-off-by: pdhirajkumarprasad <dhirajp@amd.com>
… issue

Signed-off-by: pdhirajkumarprasad <dhirajp@amd.com>
pdhirajkumarprasad added a commit that referenced this pull request May 19, 2026
…rorInvalidValue` when testing tensors with odd-element padding (#6922)

## Motivation

Fixes alignment issues in NaN bounds checking mode that caused
`hipErrorInvalidValue` when testing tensors with odd-element padding.
The root cause was misaligned pointer arithmetic when padding sizes
don't divide evenly by element sizes

## Technical Details

NaN bounds checking allocates extra buffer space filled with NaN/Inf
sentinels to detect out-of-bounds memory writes:
```
[NaN padding] [valid data] [NaN padding]
```
The pointer returned points to the middle (valid data section). When
validating results, we need to:
1. Calculate the offset to the buffer start
2. Copy the entire padded buffer for validation
3. Check that padding regions still contain NaN/Inf sentinels
**The bug occurred when:**
- Total padding elements was odd (e.g., 48887 elements)
- Converting to bytes: `48887 elements * 2 bytes = 97774 bytes`
- Dividing by 2: `97774 / 2 = 48887 bytes` (odd!)
- For Half (2-byte) data, this creates misalignment
- Result: `hipErrorInvalidValue` during hipMemcpy

### Changes
#### 1. DataInitialization.cpp
**Fix copyBadInputBuffers:**
- Changed to copy from `bad` buffer (NaN sentinels) instead of `src`
- Added alignment logic: round `paddingBytes` to even multiple of
element size before dividing
```cpp
size_t doubleElement = 2 * elementBytes;
paddingBytes = (paddingBytes / doubleElement) * doubleElement;
```
**Fix output buffer initialization:**
- Output tensors now use `copyBadInputBuffers` when NaN bounds checking
is enabled
- Ensures output buffers have NaN sentinels for validation
#### 2. ReferenceValidator.cpp
**Fix pointer calculation in checkResultsTyped:**
- Match allocation logic exactly (multiply first, then divide)
- Add same alignment rounding before dividing by 2
- Ensures pointer arithmetic matches allocation arithmetic
**Fix memory leak:**
- Changed `hipFree` → `hipHostFree` to match `hipHostMalloc`
#### 3. Add Test Coverage
**nan_bounds_check_odd_padding.yaml:**
- Tests problem sizes with odd-element padding: (137, 129), (141, 131),
(17, 19)
- Verifies alignment fixes work correctly
- Both batched and non-batched GEMM variants
- With and without UseScaleCD
### Testing
- Tested on gfx950 with odd-sized tensor configurations
- All test cases pass without `hipErrorInvalidValue`
- Validates that NaN sentinels are properly checked before and after
data
### Technical Details
The key insight is that when doing pointer arithmetic with multi-byte
types:
```cpp
// WRONG - can create misalignment:
size_t paddingBytes = paddingElements * elementBytes;
void* offset = basePtr + paddingBytes / 2;
// CORRECT - ensures alignment:
size_t paddingBytes = paddingElements * elementBytes;
paddingBytes = (paddingBytes / (2 * elementBytes)) * (2 * elementBytes);
void* offset = basePtr + paddingBytes / 2;
```
This ensures `paddingBytes / 2` is always a multiple of `elementBytes`,
preventing misalignment.

Once this PR is merged, we need merge
#6202 so that `UseBeta:
False` test works fine.

commit: f684ab1

## Test Plan

Added yaml and also checked all existing test. all are working finne

## Test Result

All test are passing

## Submission Checklist

- [ ] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

---------

Signed-off-by: pdhirajkumarprasad <dhirajp@amd.com>
aledudek pushed a commit that referenced this pull request May 20, 2026
…rorInvalidValue` when testing tensors with odd-element padding (#6922)

## Motivation

Fixes alignment issues in NaN bounds checking mode that caused
`hipErrorInvalidValue` when testing tensors with odd-element padding.
The root cause was misaligned pointer arithmetic when padding sizes
don't divide evenly by element sizes

## Technical Details

NaN bounds checking allocates extra buffer space filled with NaN/Inf
sentinels to detect out-of-bounds memory writes:
```
[NaN padding] [valid data] [NaN padding]
```
The pointer returned points to the middle (valid data section). When
validating results, we need to:
1. Calculate the offset to the buffer start
2. Copy the entire padded buffer for validation
3. Check that padding regions still contain NaN/Inf sentinels
**The bug occurred when:**
- Total padding elements was odd (e.g., 48887 elements)
- Converting to bytes: `48887 elements * 2 bytes = 97774 bytes`
- Dividing by 2: `97774 / 2 = 48887 bytes` (odd!)
- For Half (2-byte) data, this creates misalignment
- Result: `hipErrorInvalidValue` during hipMemcpy

### Changes
#### 1. DataInitialization.cpp
**Fix copyBadInputBuffers:**
- Changed to copy from `bad` buffer (NaN sentinels) instead of `src`
- Added alignment logic: round `paddingBytes` to even multiple of
element size before dividing
```cpp
size_t doubleElement = 2 * elementBytes;
paddingBytes = (paddingBytes / doubleElement) * doubleElement;
```
**Fix output buffer initialization:**
- Output tensors now use `copyBadInputBuffers` when NaN bounds checking
is enabled
- Ensures output buffers have NaN sentinels for validation
#### 2. ReferenceValidator.cpp
**Fix pointer calculation in checkResultsTyped:**
- Match allocation logic exactly (multiply first, then divide)
- Add same alignment rounding before dividing by 2
- Ensures pointer arithmetic matches allocation arithmetic
**Fix memory leak:**
- Changed `hipFree` → `hipHostFree` to match `hipHostMalloc`
#### 3. Add Test Coverage
**nan_bounds_check_odd_padding.yaml:**
- Tests problem sizes with odd-element padding: (137, 129), (141, 131),
(17, 19)
- Verifies alignment fixes work correctly
- Both batched and non-batched GEMM variants
- With and without UseScaleCD
### Testing
- Tested on gfx950 with odd-sized tensor configurations
- All test cases pass without `hipErrorInvalidValue`
- Validates that NaN sentinels are properly checked before and after
data
### Technical Details
The key insight is that when doing pointer arithmetic with multi-byte
types:
```cpp
// WRONG - can create misalignment:
size_t paddingBytes = paddingElements * elementBytes;
void* offset = basePtr + paddingBytes / 2;
// CORRECT - ensures alignment:
size_t paddingBytes = paddingElements * elementBytes;
paddingBytes = (paddingBytes / (2 * elementBytes)) * (2 * elementBytes);
void* offset = basePtr + paddingBytes / 2;
```
This ensures `paddingBytes / 2` is always a multiple of `elementBytes`,
preventing misalignment.

Once this PR is merged, we need merge
#6202 so that `UseBeta:
False` test works fine.

commit: f684ab1

## Test Plan

Added yaml and also checked all existing test. all are working finne

## Test Result

All test are passing

## Submission Checklist

- [ ] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

---------

Signed-off-by: pdhirajkumarprasad <dhirajp@amd.com>
@pdhirajkumarprasad pdhirajkumarprasad merged commit 9bece58 into develop May 23, 2026
106 of 113 checks passed
@pdhirajkumarprasad pdhirajkumarprasad deleted the users/dhirajp/AIHPBLAS-1467 branch May 23, 2026 02:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants