Fix code generation when UseBeta is false#6202
Conversation
Signed-off-by: pdhirajkumarprasad <dhirajp@amd.com>
|
this change fixes the issue mentioned in https://amd-hub.atlassian.net/browse/AIHPBLAS-1466 as well |
Codecov Report❌ Patch coverage is
❌ Your project status has failed because the head coverage (69.00%) is below the target coverage (80.00%). You can increase the head coverage or adjust the target coverage. Additional details and impacted files@@ Coverage Diff @@
## develop #6202 +/- ##
===========================================
- Coverage 66.37% 61.25% -5.12%
===========================================
Files 1606 2077 +471
Lines 268162 355315 +87153
Branches 37430 53418 +15988
===========================================
+ Hits 177989 217644 +39655
- Misses 75232 118965 +43733
- Partials 14941 18706 +3765
*This pull request uses carry forward flags. Click here to find out more.
🚀 New features to boost your workflow:
|
|
hmm.. this doesn't seem right. I will take a closer look soon, but I don't think we should require all of these checks in different places for this case. I will try to repro and update you |
Do you mean that the fixes for 1467 also happen to fix 1466? Or that this PR also has separate fixes for 1466? If it's the second option, then please split this out into two PRs, one for 1466 and one for 1467. |
|
@Alex-Vasile this PR fixed both the issue i.e 1467 and 1466 |
Alex-Vasile
left a comment
There was a problem hiding this comment.
There are no tests added to verify this fix or catch regressions, please add tests.
this PR fixed both the issue i.e 1467 and 1466
Please spilt into 2, keep this PR focused on 1467 and the tests for it.
Signed-off-by: pdhirajkumarprasad <dhirajp@amd.com>
Signed-off-by: pdhirajkumarprasad <dhirajp@amd.com>
Signed-off-by: pdhirajkumarprasad <dhirajp@amd.com>
Alex-Vasile
left a comment
There was a problem hiding this comment.
I think there's still a bug, the CI isn't passing, and there's several changes in here unrelated to UseBeta fixes.
Signed-off-by: pdhirajkumarprasad <dhirajp@amd.com>
fixed and all are passing -------------------------------------------------------------------------------------- generated xml file: /home/dhirajp/rocm-libraries/projects/hipblaslt/tensilelite/python_tests.xml --------------------------------------------------------------------------------------- |
Signed-off-by: pdhirajkumarprasad <dhirajp@amd.com>
nakajee
left a comment
There was a problem hiding this comment.
Thanks for your update.
I am OK with your change as long as all tests with both UseBata=True and False pass.
Signed-off-by: pdhirajkumarprasad <dhirajp@amd.com>
… issue Signed-off-by: pdhirajkumarprasad <dhirajp@amd.com>
…rorInvalidValue` when testing tensors with odd-element padding (#6922) ## Motivation Fixes alignment issues in NaN bounds checking mode that caused `hipErrorInvalidValue` when testing tensors with odd-element padding. The root cause was misaligned pointer arithmetic when padding sizes don't divide evenly by element sizes ## Technical Details NaN bounds checking allocates extra buffer space filled with NaN/Inf sentinels to detect out-of-bounds memory writes: ``` [NaN padding] [valid data] [NaN padding] ``` The pointer returned points to the middle (valid data section). When validating results, we need to: 1. Calculate the offset to the buffer start 2. Copy the entire padded buffer for validation 3. Check that padding regions still contain NaN/Inf sentinels **The bug occurred when:** - Total padding elements was odd (e.g., 48887 elements) - Converting to bytes: `48887 elements * 2 bytes = 97774 bytes` - Dividing by 2: `97774 / 2 = 48887 bytes` (odd!) - For Half (2-byte) data, this creates misalignment - Result: `hipErrorInvalidValue` during hipMemcpy ### Changes #### 1. DataInitialization.cpp **Fix copyBadInputBuffers:** - Changed to copy from `bad` buffer (NaN sentinels) instead of `src` - Added alignment logic: round `paddingBytes` to even multiple of element size before dividing ```cpp size_t doubleElement = 2 * elementBytes; paddingBytes = (paddingBytes / doubleElement) * doubleElement; ``` **Fix output buffer initialization:** - Output tensors now use `copyBadInputBuffers` when NaN bounds checking is enabled - Ensures output buffers have NaN sentinels for validation #### 2. ReferenceValidator.cpp **Fix pointer calculation in checkResultsTyped:** - Match allocation logic exactly (multiply first, then divide) - Add same alignment rounding before dividing by 2 - Ensures pointer arithmetic matches allocation arithmetic **Fix memory leak:** - Changed `hipFree` → `hipHostFree` to match `hipHostMalloc` #### 3. Add Test Coverage **nan_bounds_check_odd_padding.yaml:** - Tests problem sizes with odd-element padding: (137, 129), (141, 131), (17, 19) - Verifies alignment fixes work correctly - Both batched and non-batched GEMM variants - With and without UseScaleCD ### Testing - Tested on gfx950 with odd-sized tensor configurations - All test cases pass without `hipErrorInvalidValue` - Validates that NaN sentinels are properly checked before and after data ### Technical Details The key insight is that when doing pointer arithmetic with multi-byte types: ```cpp // WRONG - can create misalignment: size_t paddingBytes = paddingElements * elementBytes; void* offset = basePtr + paddingBytes / 2; // CORRECT - ensures alignment: size_t paddingBytes = paddingElements * elementBytes; paddingBytes = (paddingBytes / (2 * elementBytes)) * (2 * elementBytes); void* offset = basePtr + paddingBytes / 2; ``` This ensures `paddingBytes / 2` is always a multiple of `elementBytes`, preventing misalignment. Once this PR is merged, we need merge #6202 so that `UseBeta: False` test works fine. commit: f684ab1 ## Test Plan Added yaml and also checked all existing test. all are working finne ## Test Result All test are passing ## Submission Checklist - [ ] Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests. --------- Signed-off-by: pdhirajkumarprasad <dhirajp@amd.com>
…arately Signed-off-by: pdhirajkumarprasad <dhirajp@amd.com>
…rorInvalidValue` when testing tensors with odd-element padding (#6922) ## Motivation Fixes alignment issues in NaN bounds checking mode that caused `hipErrorInvalidValue` when testing tensors with odd-element padding. The root cause was misaligned pointer arithmetic when padding sizes don't divide evenly by element sizes ## Technical Details NaN bounds checking allocates extra buffer space filled with NaN/Inf sentinels to detect out-of-bounds memory writes: ``` [NaN padding] [valid data] [NaN padding] ``` The pointer returned points to the middle (valid data section). When validating results, we need to: 1. Calculate the offset to the buffer start 2. Copy the entire padded buffer for validation 3. Check that padding regions still contain NaN/Inf sentinels **The bug occurred when:** - Total padding elements was odd (e.g., 48887 elements) - Converting to bytes: `48887 elements * 2 bytes = 97774 bytes` - Dividing by 2: `97774 / 2 = 48887 bytes` (odd!) - For Half (2-byte) data, this creates misalignment - Result: `hipErrorInvalidValue` during hipMemcpy ### Changes #### 1. DataInitialization.cpp **Fix copyBadInputBuffers:** - Changed to copy from `bad` buffer (NaN sentinels) instead of `src` - Added alignment logic: round `paddingBytes` to even multiple of element size before dividing ```cpp size_t doubleElement = 2 * elementBytes; paddingBytes = (paddingBytes / doubleElement) * doubleElement; ``` **Fix output buffer initialization:** - Output tensors now use `copyBadInputBuffers` when NaN bounds checking is enabled - Ensures output buffers have NaN sentinels for validation #### 2. ReferenceValidator.cpp **Fix pointer calculation in checkResultsTyped:** - Match allocation logic exactly (multiply first, then divide) - Add same alignment rounding before dividing by 2 - Ensures pointer arithmetic matches allocation arithmetic **Fix memory leak:** - Changed `hipFree` → `hipHostFree` to match `hipHostMalloc` #### 3. Add Test Coverage **nan_bounds_check_odd_padding.yaml:** - Tests problem sizes with odd-element padding: (137, 129), (141, 131), (17, 19) - Verifies alignment fixes work correctly - Both batched and non-batched GEMM variants - With and without UseScaleCD ### Testing - Tested on gfx950 with odd-sized tensor configurations - All test cases pass without `hipErrorInvalidValue` - Validates that NaN sentinels are properly checked before and after data ### Technical Details The key insight is that when doing pointer arithmetic with multi-byte types: ```cpp // WRONG - can create misalignment: size_t paddingBytes = paddingElements * elementBytes; void* offset = basePtr + paddingBytes / 2; // CORRECT - ensures alignment: size_t paddingBytes = paddingElements * elementBytes; paddingBytes = (paddingBytes / (2 * elementBytes)) * (2 * elementBytes); void* offset = basePtr + paddingBytes / 2; ``` This ensures `paddingBytes / 2` is always a multiple of `elementBytes`, preventing misalignment. Once this PR is merged, we need merge #6202 so that `UseBeta: False` test works fine. commit: f684ab1 ## Test Plan Added yaml and also checked all existing test. all are working finne ## Test Result All test are passing ## Submission Checklist - [ ] Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests. --------- Signed-off-by: pdhirajkumarprasad <dhirajp@amd.com>
Motivation
https://amd-hub.atlassian.net/browse/AIHPBLAS-1467
Technical Details
Fixed multiple issues preventing TensileLight from correctly generating and executing kernels when UseBeta=false (beta parameter not used in GEMM operations). Enabled bounds checking validation to work correctly with this configuration.
Files Modified
Issue: KeyError when accessing Beta SGPR register when UseBeta=false
Fix: Added conditional check before accessing Beta SGPR
if kernel["ProblemType"]["UseBeta"]:
moduleExternalArgs.addComment("Read Beta")
moduleExternalArgs.addModuleAsFlatItems(self.externalArgLoader.loadAllKernArg(
self.sgprs["Beta"], "KernArgAddress", self.states.numSgprBeta))
Issue: UseBeta serialized as integer (0/1) instead of boolean in YAML, causing C++ parser errors
Fix: Ensure UseBeta is always stored as boolean
self.state["UseBeta"] = bool(self.state["UseBeta"])
Issue #1: Buffer allocation check didn't verify if buffer pointer was valid
Fix: Check both size and pointer validity
// Only skip reallocation if size matches AND buffer is valid
if(m_cpuResultBufferSize == bytes && m_cpuResultBuffer.get() != nullptr)
return;
Issue #2: hipFree compiler warning about nodiscard attribute
Fix: Cast return value to void in lambda deleter
uint8_t* buffer;
HIP_CHECK_EXC(hipHostMalloc((void**)&buffer, bytes, 0));
m_cpuResultBuffer.reset(buffer, [](uint8_t* p) { (void)hipFree(p); });
Issue #3: Attempting to validate null/empty tensors
Fix: Skip validation for null pointers or zero-sized tensors
// Skip validation if pointers are null or maxElements is 0
if(resPtr == nullptr || refPtr == nullptr || result.maxElements[i] == 0)
{
if(Debug::Instance().printTensorInfo())
std::cout << "Skipping validation for tensor " << tensor.getName() << std::endl;
continue;
}
Issue #4: Trying to copy padding bytes from output tensors that don't have padding
Fix: Only use maxElement for input tensors
// For output tensors, don't use maxElement with padding
if(boundsCheck == BoundsCheckMode::NaN && !tensor.isOutput())
elementsToCopy = maxElement;
Issue #5: Bounds checking validation on output tensors without padding buffers
Fix: Skip bounds checking for output tensors
// Only check bounds for input tensors (output tensors don't have padding buffers)
if(boundsCheck == BoundsCheckMode::NaN && !tensor.isOutput())
Issue: hipMemcpy with null pointers causing runtime errors
Fix: Added null pointer check
void* copyInputBuffers(const TensorDescriptor& descriptor,
void* dst,
void* src,
size_t totalElements,
hipMemcpyKind kind)
{
// Skip copy if no elements to copy or if pointers are null
if(totalElements > 0 && dst != nullptr && src != nullptr)
{
HIP_CHECK_EXC(hipMemcpy(dst, src, descriptor.elementBytes() * totalElements, kind));
}
return dst;
}
0d6cd23
Test Plan
NA
Test Result
Submission Checklist