[origami] Major refactoring of codebase#1859
Conversation
- New file , consolidates various origami types. - New structs to replace the growing tuple. - API updates to make it scalable, down from 20+ parameters to a few. - Lots of redundant code removal. '
| # Select reduction strategy | ||
| grid_algorithm = origami.grid_selection_t.analytical # default to analytical | ||
| if args.dynamic_grid_version == 0: | ||
| grid_algorithm = origami.grid_selection_t.number_of_cus | ||
| elif args.dynamic_grid_version == 1: | ||
| grid_algorithm = origami.grid_selection_t.min_resources | ||
| elif args.dynamic_grid_version == 2: | ||
| grid_algorithm = origami.grid_selection_t.energy_aware | ||
| elif args.dynamic_grid_version == 3: | ||
| grid_algorithm = origami.grid_selection_t.reduction_cost_aware | ||
| elif args.dynamic_grid_version == 4: | ||
| grid_algorithm = origami.grid_selection_t.data_parallel | ||
| elif args.dynamic_grid_version == 5: | ||
| grid_algorithm = origami.grid_selection_t.analytical | ||
| elif args.dynamic_grid_version == 6: | ||
| grid_algorithm = origami.grid_selection_t.k_split_aware | ||
|
|
There was a problem hiding this comment.
Is this an algorithm or a heuristic?
| /// Transpose types (TT, TN, NT, TT.) | ||
| bool transpose_a; | ||
| bool transpose_b; |
There was a problem hiding this comment.
I've always thought this nomenclature was kind of weird. Could we use something like "contiguous dimensions" or "dimensions contiguous in memory"? We use that information later on to do transpose specific stuff based on how it's aligned in the memory hierarchy
| data_type_t d_dtype; | ||
|
|
||
| /// Compute type. | ||
| data_type_t mi_dtype; |
There was a problem hiding this comment.
Is this the same as the accumulation type? On gfx12, there are multiple accumulation types that can be used. We might want to add that to future proof in case we start supporting gfx12.
| std::size_t batch; | ||
|
|
||
| /// Transpose types (TT, TN, NT, TT.) | ||
| bool a_transpose; |
There was a problem hiding this comment.
Might make sense to use an enumeration here. This is how hipBLASLt supports swizzling the inputs, which we might want to support.
| } | ||
| }; | ||
| struct matrix_instruction { | ||
| size_t MI_M; |
There was a problem hiding this comment.
Using all capitol letters doesn't match with formatting. Also, shouldn't need MI in the names, since they are part of matrix_instruction. Maybe we can just call the values m, n, and k.
| public: | ||
| enum class architecture_t { gfx90a, gfx942, gfx950, Count }; | ||
|
|
||
| static architecture_t arch_name_to_enum(const std::string& str) { |
There was a problem hiding this comment.
I think we should try to move source code out of the header files and move to source files when possible. Most of the code in this file can be moved to a source file.
There was a problem hiding this comment.
This is why it was moved out of source file: #1862. I need to revisit this to make sure we don't cause this problem again.
There was a problem hiding this comment.
Yes, I think that was due to the static values that were declared in this class. I actually think we should re-think a lot of this. Why are we using static methods everywhere in this file? Do we really need to have static variables as well? Those can cause weird issues, like the one pointed out in PR 1862.
|
|
||
| std::size_t hash() const { | ||
| return std::hash<size_t>()(MI_M) ^ std::hash<size_t>()(MI_N) ^ std::hash<size_t>()(MI_K) ^ | ||
| std::hash<data_type_t>()(mi_input_type); |
There was a problem hiding this comment.
This creates some conflicts, since a hash of a size_t is usually the identity operation.
For instance I think matrix_instruction(32, 32, 4, data_type_t::Half) has the same hash as matrix_instruction(16, 16, 4, data_type_t::Half) and matrix_instruction(4, 4, 4, data_type_t::Half)
| // Optionally ensure we do not exceed LDS capacity | ||
| // (If you want to factor in WGM, add it to your check_lds_capacity signature.) | ||
| // For now, let's just check the tile itself: | ||
| if (!check_lds_capacity(hardware, mt, problem.a_dtype, problem.b_dtype)) { |
There was a problem hiding this comment.
I would move this out of the loop for now. Can always put it back when adding workgroupmapping as a parameter.
|
|
||
| return std::make_tuple(numActiveCUs, numWaves, splitFactor); | ||
| l2_A_uncached_reads = static_cast<long long>(l2_m) * config.mt.mk(); | ||
| l2_B_uncached_reads = static_cast<long long>(l2_n) * config.mt.nk(); |
There was a problem hiding this comment.
This doesn't need to be in the loop?
| } | ||
|
|
||
| // Total reads considering repeated usage | ||
| long long l2_A_reads = static_cast<long long>(l2_m) * l2_n * config.mt.mk(); |
There was a problem hiding this comment.
This is long long l2_A_reads = l2_n * l2_A_uncached_reads; ?
| std::cout << "===========================================================\n"; | ||
| } | ||
| // Debug tracking info | ||
| mutable std::unordered_map<std::string, std::string> debug_info; |
There was a problem hiding this comment.
Any way to write this so that mutable isn't required. This can sometimes lead to race conditions.
|
|
||
| /// K dimension (reduction). | ||
| std::size_t k; | ||
|
|
There was a problem hiding this comment.
Can we have parameterized constructors for these types, e.g., dim3_t{size_t m_init, size_t n_init, size_t k_init)
There was a problem hiding this comment.
Yes! We should do this as well.
|
closing this PR for this one #2718 |
## Technical Details Rebase @neoblizz (Muhammad) changes (#1859) with latest develop --------- Co-authored-by: Yen Ong <yen.ong@amd.com> Co-authored-by: Bryant Nelson <bryant.nelson@amd.com> Co-authored-by: neoblizz <osama94@gmail.com> Co-authored-by: yenong-amd <92903026+yenong-amd@users.noreply.github.com> Co-authored-by: Minsu Kim <minsu.kim@amd.com> Co-authored-by: Alex Brown <alex.brown@amd.com>
## Testing Plan - Azure CI (deprecated): Runs all the Catch2 and new Python tests. ✅ - TheRock CI (integration): - CI workflow: https://github.com/ROCm/TheRock/actions/runs/20316557318 ✅ - TheRock branch: https://github.com/ROCm/TheRock/tree/users/neoblizz/minsukim-refactor ✅ (Some tests fail but build passed) - TheRock CI (libraries: hipblaslt, etc.) ✅ - Math CI (performance/hipblaslt) ✅ - Reviews from older PRs are addressed with the few exceptions. ✅ ## PRs History - Original PR: #1859 - Rebased PR: #2718 - Reverted PR: #3416 (comment) - Hot-fix PR: #3417 (review) ([TheRock CI Report](https://github.com/ROCm/TheRock/actions/runs/20293546191)) - **And now this!** ## Technical Details ### Refactor - [x] New file `types.hpp`, consolidates various origami types. - [x] New structs to replace the growing tuple. - [x] API updates to make it scalable, down from 20+ parameters to a few. - [x] Lots of redundant code removal. - [x] Reorganize into `types.hpp`; data types (see `hardware.hpp` for what needs to be moved) - [x] Refactor extract APIs - [x] Add enums for transpose - [x] Refactor debug/log reporting - [x] Remove mutable and statics - [x] Decouple `latency` out of `config_t` - [x] Rebase develop into this branch ### Python APIs & Unit Tests - [x] Update the tests - [x] Update rocroller - [x] hipblaslt/tensilelite/scripts to use the new API ### Testing Infrastructure - Replace YAML-based tests with Catch2 C++ tests - Replace GTest with Catch2 framework - Convert YAML-driven parameterized tests to pure C++ tests - Update common.hpp with direct C++ helper functions - Update CMakeLists.txt to use Catch2 instead of GTest - Remove Boost dependency (was only for YAML) - All test data now hardcoded in C++ for type safety - Better IDE support, debugging, and error messages ### Questions - Rocroller: coordinate on the API design, should they be reused? - Reuse dim3_t from hip_runtime.h (does HIP's dim3 have the same functionality)? - Should transpose/data-types be part of config as well for precomputing? ## Motivation This PR reapplies Origami project's refactor. 1. Make it simpler to make model updates. 2. Well-defined, scalable, clean interface. ## Submission Checklist - [x] Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests. ## Work Moved to Future PRs - Logger APIs - Runtime Options - Move instruction map into separate file - Additional tests: #3301 - Minor batch-count specific changes: #3289 - Move Origami's Azure CI -> TheRock CI @ibrahimw1 --------- Co-authored-by: Brad Nemanich <Brad.Nemanich@amd.com> Co-authored-by: neoblizz <osama94@gmail.com>
Important
Looking for feedback on the refactored APIs and C++ best-practices.
Refactor
types.hpp, consolidates various origami types.types.hpp; data types (seehardware.hppfor what needs to be moved) @minsukim-amdlatencyout ofconfig_t@minsukim-amdPython APIs & Unit Tests
Testing Infrastructure
Questions
Motivation