Skip to content

[Origami] Reapply the origami refactor with fixes#3452

Merged
minsukim-amd merged 6 commits into
developfrom
users/minsukim/origami_refactor_fixes
Dec 18, 2025
Merged

[Origami] Reapply the origami refactor with fixes#3452
minsukim-amd merged 6 commits into
developfrom
users/minsukim/origami_refactor_fixes

Conversation

@minsukim-amd
Copy link
Copy Markdown
Contributor

@minsukim-amd minsukim-amd commented Dec 17, 2025

Testing Plan

PRs History

Technical Details

Refactor

  • New file types.hpp, consolidates various origami types.
  • New structs to replace the growing tuple.
  • API updates to make it scalable, down from 20+ parameters to a few.
  • Lots of redundant code removal.
  • Reorganize into types.hpp; data types (see hardware.hpp for what needs to be moved)
  • Refactor extract APIs
  • Add enums for transpose
  • Refactor debug/log reporting
  • Remove mutable and statics
  • Decouple latency out of config_t
  • Rebase develop into this branch

Python APIs & Unit Tests

  • Update the tests
  • Update rocroller
  • hipblaslt/tensilelite/scripts to use the new API

Testing Infrastructure

  • Replace YAML-based tests with Catch2 C++ tests
  • Replace GTest with Catch2 framework
  • Convert YAML-driven parameterized tests to pure C++ tests
  • Update common.hpp with direct C++ helper functions
  • Update CMakeLists.txt to use Catch2 instead of GTest
  • Remove Boost dependency (was only for YAML)
  • All test data now hardcoded in C++ for type safety
  • Better IDE support, debugging, and error messages

Questions

  • Rocroller: coordinate on the API design, should they be reused?
  • Reuse dim3_t from hip_runtime.h (does HIP's dim3 have the same functionality)?
  • Should transpose/data-types be part of config as well for precomputing?

Motivation

This PR reapplies Origami project's refactor.

  1. Make it simpler to make model updates.
  2. Well-defined, scalable, clean interface.

Submission Checklist

Work Moved to Future PRs

@minsukim-amd minsukim-amd requested review from a team as code owners December 17, 2025 20:31
@minsukim-amd minsukim-amd changed the title Users/minsukim/origami refactor fixes Reapply the origami refactor with fixes Dec 17, 2025
@neoblizz neoblizz changed the title Reapply the origami refactor with fixes [Origami] Reapply the origami refactor with fixes Dec 17, 2025
@neoblizz neoblizz removed the request for review from NaveenElumalaiAMD December 18, 2025 04:23
@stellaraccident
Copy link
Copy Markdown
Contributor

In the future, please avoid this style of omnibus PR. It is fine to have a big PR that touches many LOC, but the bigger the PR, the more focused it should be on one concern (ie. Api changes, test framework changes, code refactor, functional changes). If you have the git skills, you can prepare a curated stack of commits for rebase merge, but typically if a dev is having trouble managing PR scope anyway, they lack the skills for a more advanced workflow. So I only recommend that if you have a solid process and experience.

Thank you for the additional PR details, although it is still pretty light (typical of such an omnibus PR).

What was the fix to the original issue?

@neoblizz
Copy link
Copy Markdown
Member

Thank you for the additional PR details, although it is still pretty light (typical of such an omnibus PR).

I'll work on adding more details.

What was the fix to the original issue?

@stellaraccident It was an ODR violation fixed in this commit: 093e67f, there were two static data members that were inside the .cpp file. There were multiple libraries that included the .hpp file and compiled the .cpp file. Each shared library emitted its own definition of static member causing the ODR violation. At program termination the destructor then was called twice, once per shared object.

@stellaraccident
Copy link
Copy Markdown
Contributor

Thank you for the additional PR details, although it is still pretty light (typical of such an omnibus PR).

I'll work on adding more details.

What was the fix to the original issue?

@stellaraccident It was an ODR violation fixed in this commit: 093e67f, there were two static data members that were inside the .cpp file. There were multiple libraries that included the .hpp file and compiled the .cpp file. Each shared library emitted its own definition of static member causing the ODR violation. At program termination the destructor then was called twice, once per shared object.

So trivially would have been caught by enabling host ASAN. Did valgrind catch it? You guys do know that you have the knobs to be enabling asan in your dev and CI flows, right?

@stellaraccident
Copy link
Copy Markdown
Contributor

They were enabled months ago and David knows. But no one prioritizing work on the blas team is allocating time for quality improvements.

@bnemanich
Copy link
Copy Markdown
Contributor

I did run an asan build locally, but it didn't actually show any issues. valgrind did actually show a small memory leak, but did not catch the actual issue either.

@minsukim-amd minsukim-amd merged commit 92d85e8 into develop Dec 18, 2025
48 checks passed
@minsukim-amd minsukim-amd deleted the users/minsukim/origami_refactor_fixes branch December 18, 2025 20:08
@neoblizz neoblizz mentioned this pull request Dec 19, 2025
1 task
assistant-librarian Bot pushed a commit that referenced this pull request Jan 6, 2026
* update grouped_gemm blockwise kernel

* update config

* update kernel

* update examples

* remove test code for now

* sync test files with origin/develop

* update example

* fix code lint

* fix code-lint

* update test code

* run clang format

* run pre-commit

* update api
ammallya pushed a commit that referenced this pull request Feb 3, 2026
* update grouped_gemm blockwise kernel

* update config

* update kernel

* update examples

* remove test code for now

* sync test files with origin/develop

* update example

* fix code lint

* fix code-lint

* update test code

* run clang format

* run pre-commit

* update api

[ROCm/composable_kernel commit: 76696ac]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants