Fix code generation when UseBeta is false by pdhirajkumarprasad · Pull Request #6202 · ROCm/rocm-libraries

pdhirajkumarprasad · 2026-04-07T08:44:59Z

Motivation

https://amd-hub.atlassian.net/browse/AIHPBLAS-1467

Technical Details

Fixed multiple issues preventing TensileLight from correctly generating and executing kernels when UseBeta=false (beta parameter not used in GEMM operations). Enabled bounds checking validation to work correctly with this configuration.

Files Modified

Tensile/KernelWriterAssembly.py

Issue: KeyError when accessing Beta SGPR register when UseBeta=false
Fix: Added conditional check before accessing Beta SGPR
if kernel["ProblemType"]["UseBeta"]:
moduleExternalArgs.addComment("Read Beta")
moduleExternalArgs.addModuleAsFlatItems(self.externalArgLoader.loadAllKernArg(
self.sgprs["Beta"], "KernArgAddress", self.states.numSgprBeta))

Tensile/SolutionStructs/Problem.py

Issue: UseBeta serialized as integer (0/1) instead of boolean in YAML, causing C++ parser errors
Fix: Ensure UseBeta is always stored as boolean
self.state["UseBeta"] = bool(self.state["UseBeta"])

client/src/ReferenceValidator.cpp

Issue #1: Buffer allocation check didn't verify if buffer pointer was valid
Fix: Check both size and pointer validity
// Only skip reallocation if size matches AND buffer is valid
if(m_cpuResultBufferSize == bytes && m_cpuResultBuffer.get() != nullptr)
return;

Issue #2: hipFree compiler warning about nodiscard attribute
Fix: Cast return value to void in lambda deleter
uint8_t* buffer;
HIP_CHECK_EXC(hipHostMalloc((void**)&buffer, bytes, 0));
m_cpuResultBuffer.reset(buffer, [](uint8_t* p) { (void)hipFree(p); });

Issue #3: Attempting to validate null/empty tensors
Fix: Skip validation for null pointers or zero-sized tensors
// Skip validation if pointers are null or maxElements is 0
if(resPtr == nullptr || refPtr == nullptr || result.maxElements[i] == 0)
{
if(Debug::Instance().printTensorInfo())
std::cout << "Skipping validation for tensor " << tensor.getName() << std::endl;
continue;
}

Issue #4: Trying to copy padding bytes from output tensors that don't have padding
Fix: Only use maxElement for input tensors
// For output tensors, don't use maxElement with padding
if(boundsCheck == BoundsCheckMode::NaN && !tensor.isOutput())
elementsToCopy = maxElement;

Issue #5: Bounds checking validation on output tensors without padding buffers
Fix: Skip bounds checking for output tensors
// Only check bounds for input tensors (output tensors don't have padding buffers)
if(boundsCheck == BoundsCheckMode::NaN && !tensor.isOutput())

client/src/DataInitialization.cpp

Issue: hipMemcpy with null pointers causing runtime errors
Fix: Added null pointer check
void* copyInputBuffers(const TensorDescriptor& descriptor,
void* dst,
void* src,
size_t totalElements,
hipMemcpyKind kind)
{
// Skip copy if no elements to copy or if pointers are null
if(totalElements > 0 && dst != nullptr && src != nullptr)
{
HIP_CHECK_EXC(hipMemcpy(dst, src, descriptor.elementBytes() * totalElements, kind));
}
return dst;
}

0d6cd23

Test Plan

NA

Test Result

========================================================================================== 105 passed, 83 skipped, 1 warning in 1040.87s (0:17:20) ==========================================================================================

Submission Checklist

Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

Signed-off-by: pdhirajkumarprasad <dhirajp@amd.com>

pdhirajkumarprasad · 2026-04-07T08:58:01Z

this change fixes the issue mentioned in https://amd-hub.atlassian.net/browse/AIHPBLAS-1466 as well

codecov-commenter · 2026-04-07T19:02:40Z

Codecov Report

❌ Patch coverage is 0% with 15 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
...blaslt/tensilelite/Tensile/KernelWriterAssembly.py	0.00%	15 Missing ⚠️

❌ Your project status has failed because the head coverage (69.00%) is below the target coverage (80.00%). You can increase the head coverage or adjust the target coverage.

Additional details and impacted files

@@             Coverage Diff             @@
##           develop    #6202      +/-   ##
===========================================
- Coverage    66.37%   61.25%   -5.12%     
===========================================
  Files         1606     2077     +471     
  Lines       268162   355315   +87153     
  Branches     37430    53418   +15988     
===========================================
+ Hits        177989   217644   +39655     
- Misses       75232   118965   +43733     
- Partials     14941    18706    +3765

Flag	Coverage Δ		*Carryforward flag
TensileLite	`26.06% <0.00%> (?)`
hipBLAS	`90.65% <ø> (ø)`		Carriedforward from 1aad592
hipBLASLt	`41.19% <ø> (ø)`
hipCUB	`82.21% <ø> (ø)`		Carriedforward from 1aad592
hipDNN	`79.96% <ø> (-5.53%)`	⬇️	Carriedforward from 1aad592
hipFFT	`54.89% <ø> (+4.42%)`	⬆️	Carriedforward from 1aad592
hipRAND	`76.12% <ø> (?)`		Carriedforward from 1aad592
hipSOLVER	`69.00% <ø> (-0.24%)`	⬇️	Carriedforward from 1aad592
hipSPARSE	`84.70% <ø> (-0.67%)`	⬇️	Carriedforward from 1aad592
rocBLAS	`48.11% <ø> (ø)`		Carriedforward from 1aad592
rocFFT	`52.59% <ø> (+2.79%)`	⬆️	Carriedforward from 1aad592
rocRAND	`57.11% <ø> (+0.09%)`	⬆️	Carriedforward from 1aad592
rocSOLVER	`77.83% <ø> (?)`		Carriedforward from 1aad592
rocSPARSE	`72.81% <ø> (+0.34%)`	⬆️	Carriedforward from 1aad592

*This pull request uses carry forward flags. Click here to find out more.

Files with missing lines	Coverage Δ
...blaslt/tensilelite/Tensile/KernelWriterAssembly.py	`7.50% <0.00%> (ø)`

... and 689 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

talumbau · 2026-04-13T20:57:29Z

hmm.. this doesn't seem right. I will take a closer look soon, but I don't think we should require all of these checks in different places for this case. I will try to repro and update you

Alex-Vasile · 2026-04-14T15:00:14Z

this change fixes the issue mentioned in https://amd-hub.atlassian.net/browse/AIHPBLAS-1466 as well

Do you mean that the fixes for 1467 also happen to fix 1466? Or that this PR also has separate fixes for 1466?

If it's the second option, then please split this out into two PRs, one for 1466 and one for 1467.

pdhirajkumarprasad · 2026-04-14T15:03:08Z

@Alex-Vasile this PR fixed both the issue i.e 1467 and 1466

Alex-Vasile

There are no tests added to verify this fix or catch regressions, please add tests.

this PR fixed both the issue i.e 1467 and 1466

Please spilt into 2, keep this PR focused on 1467 and the tests for it.

Signed-off-by: pdhirajkumarprasad <dhirajp@amd.com>

Alex-Vasile

I think there's still a bug, the CI isn't passing, and there's several changes in here unrelated to UseBeta fixes.

Signed-off-by: pdhirajkumarprasad <dhirajp@amd.com>

pdhirajkumarprasad · 2026-04-28T05:20:29Z

I think there's still a bug, the CI isn't passing, and there's several changes in here unrelated to UseBeta fixes.

fixed and all are passing

-------------------------------------------------------------------------------------- generated xml file: /home/dhirajp/rocm-libraries/projects/hipblaslt/tensilelite/python_tests.xml ---------------------------------------------------------------------------------------
========================================================================================================== 25 passed, 234 skipped, 16 warnings in 605.56s (0:10:05) ==========================================================================================================

Signed-off-by: pdhirajkumarprasad <dhirajp@amd.com>

nakajee

Thanks for your update.
I am OK with your change as long as all tests with both UseBata=True and False pass.

Signed-off-by: pdhirajkumarprasad <dhirajp@amd.com>

… issue Signed-off-by: pdhirajkumarprasad <dhirajp@amd.com>

…rorInvalidValue` when testing tensors with odd-element padding (#6922) ## Motivation Fixes alignment issues in NaN bounds checking mode that caused `hipErrorInvalidValue` when testing tensors with odd-element padding. The root cause was misaligned pointer arithmetic when padding sizes don't divide evenly by element sizes ## Technical Details NaN bounds checking allocates extra buffer space filled with NaN/Inf sentinels to detect out-of-bounds memory writes: ``` [NaN padding] [valid data] [NaN padding] ``` The pointer returned points to the middle (valid data section). When validating results, we need to: 1. Calculate the offset to the buffer start 2. Copy the entire padded buffer for validation 3. Check that padding regions still contain NaN/Inf sentinels **The bug occurred when:** - Total padding elements was odd (e.g., 48887 elements) - Converting to bytes: `48887 elements * 2 bytes = 97774 bytes` - Dividing by 2: `97774 / 2 = 48887 bytes` (odd!) - For Half (2-byte) data, this creates misalignment - Result: `hipErrorInvalidValue` during hipMemcpy ### Changes #### 1. DataInitialization.cpp **Fix copyBadInputBuffers:** - Changed to copy from `bad` buffer (NaN sentinels) instead of `src` - Added alignment logic: round `paddingBytes` to even multiple of element size before dividing ```cpp size_t doubleElement = 2 * elementBytes; paddingBytes = (paddingBytes / doubleElement) * doubleElement; ``` **Fix output buffer initialization:** - Output tensors now use `copyBadInputBuffers` when NaN bounds checking is enabled - Ensures output buffers have NaN sentinels for validation #### 2. ReferenceValidator.cpp **Fix pointer calculation in checkResultsTyped:** - Match allocation logic exactly (multiply first, then divide) - Add same alignment rounding before dividing by 2 - Ensures pointer arithmetic matches allocation arithmetic **Fix memory leak:** - Changed `hipFree` → `hipHostFree` to match `hipHostMalloc` #### 3. Add Test Coverage **nan_bounds_check_odd_padding.yaml:** - Tests problem sizes with odd-element padding: (137, 129), (141, 131), (17, 19) - Verifies alignment fixes work correctly - Both batched and non-batched GEMM variants - With and without UseScaleCD ### Testing - Tested on gfx950 with odd-sized tensor configurations - All test cases pass without `hipErrorInvalidValue` - Validates that NaN sentinels are properly checked before and after data ### Technical Details The key insight is that when doing pointer arithmetic with multi-byte types: ```cpp // WRONG - can create misalignment: size_t paddingBytes = paddingElements * elementBytes; void* offset = basePtr + paddingBytes / 2; // CORRECT - ensures alignment: size_t paddingBytes = paddingElements * elementBytes; paddingBytes = (paddingBytes / (2 * elementBytes)) * (2 * elementBytes); void* offset = basePtr + paddingBytes / 2; ``` This ensures `paddingBytes / 2` is always a multiple of `elementBytes`, preventing misalignment. Once this PR is merged, we need merge #6202 so that `UseBeta: False` test works fine. commit: f684ab1 ## Test Plan Added yaml and also checked all existing test. all are working finne ## Test Result All test are passing ## Submission Checklist - [ ] Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests. --------- Signed-off-by: pdhirajkumarprasad <dhirajp@amd.com>

…arately Signed-off-by: pdhirajkumarprasad <dhirajp@amd.com>

…rorInvalidValue` when testing tensors with odd-element padding (#6922) ## Motivation Fixes alignment issues in NaN bounds checking mode that caused `hipErrorInvalidValue` when testing tensors with odd-element padding. The root cause was misaligned pointer arithmetic when padding sizes don't divide evenly by element sizes ## Technical Details NaN bounds checking allocates extra buffer space filled with NaN/Inf sentinels to detect out-of-bounds memory writes: ``` [NaN padding] [valid data] [NaN padding] ``` The pointer returned points to the middle (valid data section). When validating results, we need to: 1. Calculate the offset to the buffer start 2. Copy the entire padded buffer for validation 3. Check that padding regions still contain NaN/Inf sentinels **The bug occurred when:** - Total padding elements was odd (e.g., 48887 elements) - Converting to bytes: `48887 elements * 2 bytes = 97774 bytes` - Dividing by 2: `97774 / 2 = 48887 bytes` (odd!) - For Half (2-byte) data, this creates misalignment - Result: `hipErrorInvalidValue` during hipMemcpy ### Changes #### 1. DataInitialization.cpp **Fix copyBadInputBuffers:** - Changed to copy from `bad` buffer (NaN sentinels) instead of `src` - Added alignment logic: round `paddingBytes` to even multiple of element size before dividing ```cpp size_t doubleElement = 2 * elementBytes; paddingBytes = (paddingBytes / doubleElement) * doubleElement; ``` **Fix output buffer initialization:** - Output tensors now use `copyBadInputBuffers` when NaN bounds checking is enabled - Ensures output buffers have NaN sentinels for validation #### 2. ReferenceValidator.cpp **Fix pointer calculation in checkResultsTyped:** - Match allocation logic exactly (multiply first, then divide) - Add same alignment rounding before dividing by 2 - Ensures pointer arithmetic matches allocation arithmetic **Fix memory leak:** - Changed `hipFree` → `hipHostFree` to match `hipHostMalloc` #### 3. Add Test Coverage **nan_bounds_check_odd_padding.yaml:** - Tests problem sizes with odd-element padding: (137, 129), (141, 131), (17, 19) - Verifies alignment fixes work correctly - Both batched and non-batched GEMM variants - With and without UseScaleCD ### Testing - Tested on gfx950 with odd-sized tensor configurations - All test cases pass without `hipErrorInvalidValue` - Validates that NaN sentinels are properly checked before and after data ### Technical Details The key insight is that when doing pointer arithmetic with multi-byte types: ```cpp // WRONG - can create misalignment: size_t paddingBytes = paddingElements * elementBytes; void* offset = basePtr + paddingBytes / 2; // CORRECT - ensures alignment: size_t paddingBytes = paddingElements * elementBytes; paddingBytes = (paddingBytes / (2 * elementBytes)) * (2 * elementBytes); void* offset = basePtr + paddingBytes / 2; ``` This ensures `paddingBytes / 2` is always a multiple of `elementBytes`, preventing misalignment. Once this PR is merged, we need merge #6202 so that `UseBeta: False` test works fine. commit: f684ab1 ## Test Plan Added yaml and also checked all existing test. all are working finne ## Test Result All test are passing ## Submission Checklist - [ ] Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests. --------- Signed-off-by: pdhirajkumarprasad <dhirajp@amd.com>

Fix code generation when UseBeta is false

0d6cd23

Signed-off-by: pdhirajkumarprasad <dhirajp@amd.com>

pdhirajkumarprasad requested a review from a team as a code owner April 7, 2026 08:45

github-actions Bot added project: hipblaslt project: hipsparselt ci:hipsparselt-fast labels Apr 7, 2026

pdhirajkumarprasad requested review from nakajee and talumbau April 7, 2026 08:57

assistant-librarian Bot added the organization: ROCm label Apr 7, 2026

Merge branch 'develop' into users/dhirajp/AIHPBLAS-1467

42ee3a0

talumbau requested review from Alex-Vasile and removed request for talumbau April 13, 2026 20:58

Alex-Vasile requested changes Apr 14, 2026

View reviewed changes

updated based on change

3fbd640

Signed-off-by: pdhirajkumarprasad <dhirajp@amd.com>

pdhirajkumarprasad requested a review from Alex-Vasile April 15, 2026 06:45

Merge branch 'develop' into users/dhirajp/AIHPBLAS-1467

687a265

Alex-Vasile requested changes Apr 16, 2026

View reviewed changes

addressing all the review

65b7a61

Signed-off-by: pdhirajkumarprasad <dhirajp@amd.com>

pdhirajkumarprasad requested a review from Alex-Vasile April 17, 2026 05:41

addressed NaN and code gen but when UseBeta is false

59638b9

Signed-off-by: pdhirajkumarprasad <dhirajp@amd.com>

Alex-Vasile requested changes Apr 23, 2026

View reviewed changes

Comment thread projects/hipblaslt/tensilelite/client/src/ReferenceValidator.cpp

Comment thread projects/hipblaslt/tensilelite/client/src/ReferenceValidator.cpp Outdated

Comment thread projects/hipblaslt/tensilelite/client/src/ReferenceValidator.cpp Outdated

fix the failure on gfx125x branch and other feedback

e6827b7

Signed-off-by: pdhirajkumarprasad <dhirajp@amd.com>

Merge branch 'develop' into users/dhirajp/AIHPBLAS-1467

ae8fbf1

nakajee reviewed Apr 28, 2026

View reviewed changes

Comment thread projects/hipblaslt/tensilelite/Tensile/KernelWriterAssembly.py

Updated to refer globalWriteElements instead of using compile-tile flag

0f4293d

Signed-off-by: pdhirajkumarprasad <dhirajp@amd.com>

pdhirajkumarprasad requested review from Alex-Vasile and nakajee April 28, 2026 05:54

nakajee approved these changes Apr 28, 2026

View reviewed changes

Alex-Vasile requested changes Apr 28, 2026

View reviewed changes

pdhirajkumarprasad added 2 commits April 29, 2026 00:39

fixed based on comment

976d0d9

Signed-off-by: pdhirajkumarprasad <dhirajp@amd.com>

remove the NaN related fix in this PR and created separate PR for NaN…

1aad592

… issue Signed-off-by: pdhirajkumarprasad <dhirajp@amd.com>

pdhirajkumarprasad mentioned this pull request Apr 29, 2026

Fixes alignment issues in NaN bounds checking mode that caused hipErrorInvalidValue when testing tensors with odd-element padding #6922

Merged

1 task

pdhirajkumarprasad and others added 2 commits May 19, 2026 07:53

Merge branch 'develop' into users/dhirajp/AIHPBLAS-1467

9e4ebd2

revert initialization to see what is failing and will handle that sep…

6b2baf9

…arately Signed-off-by: pdhirajkumarprasad <dhirajp@amd.com>

pdhirajkumarprasad requested a review from Alex-Vasile May 19, 2026 10:41

Alex-Vasile approved these changes May 22, 2026

View reviewed changes

pdhirajkumarprasad merged commit 9bece58 into develop May 23, 2026
106 of 113 checks passed

pdhirajkumarprasad deleted the users/dhirajp/AIHPBLAS-1467 branch May 23, 2026 02:53

KKyang mentioned this pull request May 25, 2026

Revert "Fix code generation when UseBeta is false" #7731

Open

Conversation

pdhirajkumarprasad commented Apr 7, 2026

Motivation

Technical Details

Test Plan

Test Result

Submission Checklist

Uh oh!

pdhirajkumarprasad commented Apr 7, 2026

Uh oh!

codecov-commenter commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

talumbau commented Apr 13, 2026

Uh oh!

Alex-Vasile commented Apr 14, 2026

Uh oh!

pdhirajkumarprasad commented Apr 14, 2026

Uh oh!

Alex-Vasile left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Alex-Vasile left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pdhirajkumarprasad commented Apr 28, 2026

Uh oh!

Uh oh!

nakajee left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

codecov-commenter commented Apr 7, 2026 •

edited

Loading