Skip to content

[origami] Major refactoring of codebase#1859

Closed
neoblizz wants to merge 26 commits into
developfrom
users/neoblizz/origami-refactor
Closed

[origami] Major refactoring of codebase#1859
neoblizz wants to merge 26 commits into
developfrom
users/neoblizz/origami-refactor

Conversation

@neoblizz
Copy link
Copy Markdown
Member

@neoblizz neoblizz commented Sep 27, 2025

Important

Looking for feedback on the refactored APIs and C++ best-practices.

Refactor

  • New file types.hpp, consolidates various origami types.
  • New structs to replace the growing tuple.
  • API updates to make it scalable, down from 20+ parameters to a few.
  • Lots of redundant code removal.
  • Reorganize into types.hpp; data types (see hardware.hpp for what needs to be moved) @minsukim-amd
  • Refactor extract APIs
  • Add enums for transpose
  • Add enums for cache modifier
  • Refactor debug/log reporting
  • Remove mutable and statics @minsukim-amd
  • Decouple latency out of config_t @minsukim-amd
  • Rebase develop into this branch @ryanswann-amd

Python APIs & Unit Tests

  • Update python APIs + ATEN compatible APIs
  • Update the tests
  • Update rocroller @minsukim-amd
  • hipblaslt/tensilelite/scripts to use the new API @minsukim-amd

Testing Infrastructure

  • Replace YAML-based tests with Catch2 C++ tests
  • Replace GTest with Catch2 framework
  • Convert YAML-driven parameterized tests to pure C++ tests
  • Update common.hpp with direct C++ helper functions
  • Update CMakeLists.txt to use Catch2 instead of GTest
  • Remove Boost dependency (was only for YAML)
  • All test data now hardcoded in C++ for type safety
  • Better IDE support, debugging, and error messages

Questions

  • Rocroller: coordinate on the API design, should they be reused?
  • Reuse dim3_t from hip_runtime.h (does HIP's dim3 have the same functionality)?
  • Should transpose/data-types be part of config as well for precomputing?

Motivation

  1. Make it simpler to make model updates.
  2. Well-defined, scalable, clean interface.

- New file , consolidates various origami types.
- New structs to replace the growing tuple.
- API updates to make it scalable, down from 20+ parameters to a few.
- Lots of redundant code removal.
'
@ryanswann-amd ryanswann-amd self-requested a review September 29, 2025 18:36
Comment thread shared/origami/include/origami/gemm.hpp Outdated
Comment on lines +95 to +111
# Select reduction strategy
grid_algorithm = origami.grid_selection_t.analytical # default to analytical
if args.dynamic_grid_version == 0:
grid_algorithm = origami.grid_selection_t.number_of_cus
elif args.dynamic_grid_version == 1:
grid_algorithm = origami.grid_selection_t.min_resources
elif args.dynamic_grid_version == 2:
grid_algorithm = origami.grid_selection_t.energy_aware
elif args.dynamic_grid_version == 3:
grid_algorithm = origami.grid_selection_t.reduction_cost_aware
elif args.dynamic_grid_version == 4:
grid_algorithm = origami.grid_selection_t.data_parallel
elif args.dynamic_grid_version == 5:
grid_algorithm = origami.grid_selection_t.analytical
elif args.dynamic_grid_version == 6:
grid_algorithm = origami.grid_selection_t.k_split_aware

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this an algorithm or a heuristic?

Comment thread shared/origami/src/origami/gemm.cpp Outdated
Comment thread shared/origami/src/origami/gemm.cpp Outdated
Comment thread shared/origami/src/origami/gemm.cpp
Comment thread shared/origami/src/origami/gemm.cpp
Comment on lines +138 to +140
/// Transpose types (TT, TN, NT, TT.)
bool transpose_a;
bool transpose_b;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've always thought this nomenclature was kind of weird. Could we use something like "contiguous dimensions" or "dimensions contiguous in memory"? We use that information later on to do transpose specific stuff based on how it's aligned in the memory hierarchy

Comment thread shared/origami/src/origami/utils.cpp Outdated
Comment thread shared/origami/include/origami/types.hpp
@neoblizz neoblizz self-assigned this Sep 30, 2025
Comment thread shared/origami/include/origami/types.hpp Outdated
data_type_t d_dtype;

/// Compute type.
data_type_t mi_dtype;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the same as the accumulation type? On gfx12, there are multiple accumulation types that can be used. We might want to add that to future proof in case we start supporting gfx12.

std::size_t batch;

/// Transpose types (TT, TN, NT, TT.)
bool a_transpose;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might make sense to use an enumeration here. This is how hipBLASLt supports swizzling the inputs, which we might want to support.

Copy link
Copy Markdown
Member Author

@neoblizz neoblizz Oct 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bnemanich as in an enum: TT, TN, NT, NN?

Comment thread shared/origami/include/origami/types.hpp Outdated
Comment thread shared/origami/include/origami/types.hpp Outdated
Comment thread shared/origami/include/origami/types.hpp Outdated
}
};
struct matrix_instruction {
size_t MI_M;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using all capitol letters doesn't match with formatting. Also, shouldn't need MI in the names, since they are part of matrix_instruction. Maybe we can just call the values m, n, and k.

public:
enum class architecture_t { gfx90a, gfx942, gfx950, Count };

static architecture_t arch_name_to_enum(const std::string& str) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should try to move source code out of the header files and move to source files when possible. Most of the code in this file can be moved to a source file.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is why it was moved out of source file: #1862. I need to revisit this to make sure we don't cause this problem again.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think that was due to the static values that were declared in this class. I actually think we should re-think a lot of this. Why are we using static methods everywhere in this file? Do we really need to have static variables as well? Those can cause weird issues, like the one pointed out in PR 1862.

Comment thread shared/origami/include/origami/utils.hpp Outdated
Comment thread shared/origami/include/origami/origami.hpp Outdated
Comment thread shared/origami/src/origami/streamk.cpp Outdated
Comment thread shared/origami/src/origami/gemm.cpp Outdated

std::size_t hash() const {
return std::hash<size_t>()(MI_M) ^ std::hash<size_t>()(MI_N) ^ std::hash<size_t>()(MI_K) ^
std::hash<data_type_t>()(mi_input_type);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This creates some conflicts, since a hash of a size_t is usually the identity operation.
For instance I think matrix_instruction(32, 32, 4, data_type_t::Half) has the same hash as matrix_instruction(16, 16, 4, data_type_t::Half) and matrix_instruction(4, 4, 4, data_type_t::Half)

// Optionally ensure we do not exceed LDS capacity
// (If you want to factor in WGM, add it to your check_lds_capacity signature.)
// For now, let's just check the tile itself:
if (!check_lds_capacity(hardware, mt, problem.a_dtype, problem.b_dtype)) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would move this out of the loop for now. Can always put it back when adding workgroupmapping as a parameter.


return std::make_tuple(numActiveCUs, numWaves, splitFactor);
l2_A_uncached_reads = static_cast<long long>(l2_m) * config.mt.mk();
l2_B_uncached_reads = static_cast<long long>(l2_n) * config.mt.nk();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't need to be in the loop?

}

// Total reads considering repeated usage
long long l2_A_reads = static_cast<long long>(l2_m) * l2_n * config.mt.mk();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is long long l2_A_reads = l2_n * l2_A_uncached_reads; ?

std::cout << "===========================================================\n";
}
// Debug tracking info
mutable std::unordered_map<std::string, std::string> debug_info;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any way to write this so that mutable isn't required. This can sometimes lead to race conditions.


/// K dimension (reduction).
std::size_t k;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have parameterized constructors for these types, e.g., dim3_t{size_t m_init, size_t n_init, size_t k_init)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! We should do this as well.

Comment thread shared/origami/include/origami/hardware.hpp Outdated
@NaveenElumalaiAMD
Copy link
Copy Markdown
Contributor

closing this PR for this one #2718

bnemanich pushed a commit that referenced this pull request Dec 9, 2025
## Technical Details

Rebase @neoblizz (Muhammad) changes
(#1859) with latest develop

---------

Co-authored-by: Yen Ong <yen.ong@amd.com>
Co-authored-by: Bryant Nelson <bryant.nelson@amd.com>
Co-authored-by: neoblizz <osama94@gmail.com>
Co-authored-by: yenong-amd <92903026+yenong-amd@users.noreply.github.com>
Co-authored-by: Minsu Kim <minsu.kim@amd.com>
Co-authored-by: Alex Brown <alex.brown@amd.com>
minsukim-amd added a commit that referenced this pull request Dec 18, 2025
## Testing Plan
- Azure CI (deprecated): Runs all the Catch2 and new Python tests. ✅ 
- TheRock CI (integration): 
- CI workflow: https://github.com/ROCm/TheRock/actions/runs/20316557318
✅
- TheRock branch:
https://github.com/ROCm/TheRock/tree/users/neoblizz/minsukim-refactor ✅
(Some tests fail but build passed)
- TheRock CI (libraries: hipblaslt, etc.) ✅ 
- Math CI (performance/hipblaslt) ✅ 
- Reviews from older PRs are addressed with the few exceptions. ✅ 

## PRs History
- Original PR: #1859
- Rebased PR: #2718
- Reverted PR:
#3416 (comment)
- Hot-fix PR:
#3417 (review)
([TheRock CI
Report](https://github.com/ROCm/TheRock/actions/runs/20293546191))
- **And now this!**

## Technical Details

### Refactor
- [x] New file `types.hpp`, consolidates various origami types.
- [x] New structs to replace the growing tuple.
- [x] API updates to make it scalable, down from 20+ parameters to a
few.
- [x] Lots of redundant code removal.
- [x] Reorganize into `types.hpp`; data types (see `hardware.hpp` for
what needs to be moved)
- [x] Refactor extract APIs
- [x] Add enums for transpose
- [x] Refactor debug/log reporting
- [x] Remove mutable and statics
- [x] Decouple `latency` out of `config_t` 
- [x] Rebase develop into this branch

### Python APIs & Unit Tests
- [x] Update the tests
- [x] Update rocroller
- [x] hipblaslt/tensilelite/scripts to use the new API

### Testing Infrastructure
- Replace YAML-based tests with Catch2 C++ tests
- Replace GTest with Catch2 framework
- Convert YAML-driven parameterized tests to pure C++ tests
- Update common.hpp with direct C++ helper functions
- Update CMakeLists.txt to use Catch2 instead of GTest
- Remove Boost dependency (was only for YAML)
- All test data now hardcoded in C++ for type safety
- Better IDE support, debugging, and error messages

### Questions
- Rocroller: coordinate on the API design, should they be reused?
- Reuse dim3_t from hip_runtime.h (does HIP's dim3 have the same
functionality)?
- Should transpose/data-types be part of config as well for
precomputing?

## Motivation

This PR reapplies Origami project's refactor.

1. Make it simpler to make model updates.
2. Well-defined, scalable, clean interface.

## Submission Checklist

- [x] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

## Work Moved to Future PRs

- Logger APIs
- Runtime Options
- Move instruction map into separate file
- Additional tests: #3301
- Minor batch-count specific changes:
#3289
- Move Origami's Azure CI -> TheRock CI @ibrahimw1

---------

Co-authored-by: Brad Nemanich <Brad.Nemanich@amd.com>
Co-authored-by: neoblizz <osama94@gmail.com>
@yenong-amd yenong-amd deleted the users/neoblizz/origami-refactor branch January 16, 2026 18:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants