Revert "[origami] Major refactoring of codebase (#2718)"#3416
Merged
Conversation
This reverts commit 283f438.
Collaborator
|
The Azure CI results pass but is not reporting back to the summary step check. |
Collaborator
|
Updated PR description with run from TheRock past the previous failing points. Will merge. |
This was referenced Dec 17, 2025
minsukim-amd
added a commit
that referenced
this pull request
Dec 17, 2025
15 tasks
minsukim-amd
added a commit
that referenced
this pull request
Dec 18, 2025
## Testing Plan - Azure CI (deprecated): Runs all the Catch2 and new Python tests. ✅ - TheRock CI (integration): - CI workflow: https://github.com/ROCm/TheRock/actions/runs/20316557318 ✅ - TheRock branch: https://github.com/ROCm/TheRock/tree/users/neoblizz/minsukim-refactor ✅ (Some tests fail but build passed) - TheRock CI (libraries: hipblaslt, etc.) ✅ - Math CI (performance/hipblaslt) ✅ - Reviews from older PRs are addressed with the few exceptions. ✅ ## PRs History - Original PR: #1859 - Rebased PR: #2718 - Reverted PR: #3416 (comment) - Hot-fix PR: #3417 (review) ([TheRock CI Report](https://github.com/ROCm/TheRock/actions/runs/20293546191)) - **And now this!** ## Technical Details ### Refactor - [x] New file `types.hpp`, consolidates various origami types. - [x] New structs to replace the growing tuple. - [x] API updates to make it scalable, down from 20+ parameters to a few. - [x] Lots of redundant code removal. - [x] Reorganize into `types.hpp`; data types (see `hardware.hpp` for what needs to be moved) - [x] Refactor extract APIs - [x] Add enums for transpose - [x] Refactor debug/log reporting - [x] Remove mutable and statics - [x] Decouple `latency` out of `config_t` - [x] Rebase develop into this branch ### Python APIs & Unit Tests - [x] Update the tests - [x] Update rocroller - [x] hipblaslt/tensilelite/scripts to use the new API ### Testing Infrastructure - Replace YAML-based tests with Catch2 C++ tests - Replace GTest with Catch2 framework - Convert YAML-driven parameterized tests to pure C++ tests - Update common.hpp with direct C++ helper functions - Update CMakeLists.txt to use Catch2 instead of GTest - Remove Boost dependency (was only for YAML) - All test data now hardcoded in C++ for type safety - Better IDE support, debugging, and error messages ### Questions - Rocroller: coordinate on the API design, should they be reused? - Reuse dim3_t from hip_runtime.h (does HIP's dim3 have the same functionality)? - Should transpose/data-types be part of config as well for precomputing? ## Motivation This PR reapplies Origami project's refactor. 1. Make it simpler to make model updates. 2. Well-defined, scalable, clean interface. ## Submission Checklist - [x] Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests. ## Work Moved to Future PRs - Logger APIs - Runtime Options - Move instruction map into separate file - Additional tests: #3301 - Minor batch-count specific changes: #3289 - Move Origami's Azure CI -> TheRock CI @ibrahimw1 --------- Co-authored-by: Brad Nemanich <Brad.Nemanich@amd.com> Co-authored-by: neoblizz <osama94@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
It was determined that this commit introduces memory corruption issues with TheRock python packaging build. See ROCm/TheRock#2522 and need this commit reverted to move the rocm-libraries submodule pointer in TheRock forward.
Error
corrupted size vs. prev_size in fastbinsAffected Versions
Ways to Reproduce
Error happens during steps 2-4. I have not seen an instance of a workflow getting past step 4.
Investigation Test Results A
Investigation Test Results B
pytorch build workflow with rocm-libraries submodule at above SHA-1: e0fa9ee
Run: https://github.com/ROCm/TheRock/actions/runs/20254276739/job/58154885187
Test Results A show corrupted size vs. prev_size in fastbins throughout the different python 3.11-3.13 runs.
Test Results B do not show the issue.
Revert Sequence
Verifying This Pull Request
Verification Results