Skip to content

Revert "[origami] Major refactoring of codebase (#2718)"#3411

Closed
jayhawk-commits wants to merge 1 commit into
developfrom
users/jayhawk-commits/revert-origami-refactor
Closed

Revert "[origami] Major refactoring of codebase (#2718)"#3411
jayhawk-commits wants to merge 1 commit into
developfrom
users/jayhawk-commits/revert-origami-refactor

Conversation

@jayhawk-commits
Copy link
Copy Markdown
Collaborator

@jayhawk-commits jayhawk-commits commented Dec 16, 2025

It was determined that this commit introduces memory corruption issues with TheRock python packaging build. See ROCm/TheRock#2522 and need this commit reverted to move the rocm-libraries submodule pointer in TheRock forward.

Error

corrupted size vs. prev_size in fastbins

Affected Versions

  • python 3.11, 3.12, 3.13
  • pytorch 2.7, 2.8, 2.9, main branches
  • TheRock with rocm-libraries submodule pointed to this commit or newer

Ways to Reproduce

  1. Build ROCm via TheRock.
  2. Build pytorch wheels via TheRock.
  3. Run rocm-sdk test.
  4. Run python ./external-builds/pytorch/run_pytorch_smoke_tests.py.

Error happens during steps 2-4. I have not seen an instance of a workflow getting past step 4.

Investigation Test Results A

Investigation Test Results B

Test Results A show corrupted size vs. prev_size in fastbins throughout the different python 3.11-3.13 runs.
Test Results B do not show the issue.

Revert Sequence

  1. Cloned latest rocm-libraries.
  2. Used CLI to execute git revert 283f438877a0982f170f76d5441dcca365741aa6
  3. Manually resolved merge conflict in projects/hipblaslt/library/src/amd_detail/rocblaslt/src/rocroller/solution_selection.cpp, referring to the diffs in PR [origami] Major refactoring of codebase #2718

Verifying This Pull Request

  1. Creating a branch on TheRock that points to this PR's branch as the rocm-libraries submodule. See https://github.com/ROCm/TheRock/tree/users/jayhawk-commits/revert-origami-refactor-test
  2. Run pytorch build workflow with that branch to verify the error does not occur.

Verification Results

This reverts commit 283f438.

Manually resolved merge conflict in `solution_selection.cpp`
Copy link
Copy Markdown
Member

@marbre marbre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The verification build (https://github.com/ROCm/TheRock/actions/runs/20260025013) fails building hipBLASLt:

[hipBLASLt] /therock/src/rocm-libraries/projects/hipblaslt/library/src/amd_detail/rocblaslt/src/rocroller/solution_selection.cpp:202:20: error: no member named 'scaleABlockRowSize' in 'KernelType'
[hipBLASLt]   202 |         kernelType.scaleABlockRowSize * kernelType.scaleABlockColSize, //Handle A vs B block size.
[hipBLASLt]       |         ~~~~~~~~~~ ^
[hipBLASLt] /therock/src/rocm-libraries/projects/hipblaslt/library/src/amd_detail/rocblaslt/src/rocroller/solution_selection.cpp:202:52: error: no member named 'scaleABlockColSize' in 'KernelType'
[hipBLASLt]   202 |         kernelType.scaleABlockRowSize * kernelType.scaleABlockColSize, //Handle A vs B block size.
[hipBLASLt]       |                                         ~~~~~~~~~~ ^
[hipBLASLt] /therock/src/rocm-libraries/projects/hipblaslt/library/src/amd_detail/rocblaslt/src/rocroller/solution_selection.cpp:245:27: error: use of undeclared identifier 'analytical_hardware'; did you mean 'analaytical_hardware'?
[hipBLASLt]   245 |             if(numTiles < analytical_hardware.N_CU && !isF6)
[hipBLASLt]       |                           ^~~~~~~~~~~~~~~~~~~
[hipBLASLt]       |                           analaytical_hardware
[hipBLASLt] /therock/src/rocm-libraries/projects/hipblaslt/library/src/amd_detail/rocblaslt/src/rocroller/solution_selection.cpp:185:31: note: 'analaytical_hardware' declared here
[hipBLASLt]   185 |     const origami::hardware_t analaytical_hardware = origami::hardware_t::get_hardware_for_device(0);
[hipBLASLt]       |                               ^
[hipBLASLt] 3 errors generated.

@jayhawk-commits
Copy link
Copy Markdown
Collaborator Author

It looks like #2732 and #3177 touch many of the same files. Will need hipblaslt developers to help with reverting these changes.

@ScottTodd
Copy link
Copy Markdown
Member

It looks like #2732 and #3177 touch many of the same files. Will need hipblaslt developers to help with reverting these changes.

Revert all of them. See https://github.com/ROCm/TheRock/blob/main/docs/rfcs/RFC0002-MonoRepo-Gardener-Rotations.md#reverting-vs-fixing-forward

If multiple commits are implicated in broken workflows, cleanly reverting them all sequentially is less risky than reverting manually.

@jayhawk-commits
Copy link
Copy Markdown
Collaborator Author

Replaced by #3416

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants