[Origami] Improve Origami's Test Coverage by NaveenElumalaiAMD · Pull Request #3301 · ROCm/rocm-libraries

NaveenElumalaiAMD · 2025-12-10T23:45:21Z

Motivation

Expanded unit testing and improved analytical modeling coverage for Origami GEMM module. Enhanced documentation and robustness of selection/ranking utilities.

Technical Details

Added new analytical utility functions to origami/gemm.hpp for modeling work/output utilization, CU occupancy, memory bandwidth, L2 hit rate, and arithmetic intensity. Improved Doxygen documentation and removed obsolete API.
Provided safe-guards in mathematical routines (e.g., division-by-zero checks).
Refactored helpers (make_hardware, make_config) to ensure robust and architecture-specific test setups.
Significantly expanded unit tests in test_origami.cpp to:
- Exercise new utility/modeling functions and all selection/ranking APIs.
- Validate tie-breaking, invalid configs, edge-case and exception handling, environmental settings, WGM logic, and order determinism.
- Rename/clean up test case names for clarity and consistency.
Minor code and comment cleanups throughout.

Test Result

All unit tests pass locally.

…ranch (#3301) * [CK_TILE] Port hw independent changes from internal repo to develop branch It includes PR#96, #114, #120, #121. * correct rebase error

@ibrahimw1

## Testing Plan - Azure CI (deprecated): Runs all the Catch2 and new Python tests. ✅ - TheRock CI (integration): - CI workflow: https://github.com/ROCm/TheRock/actions/runs/20316557318 ✅ - TheRock branch: https://github.com/ROCm/TheRock/tree/users/neoblizz/minsukim-refactor ✅ (Some tests fail but build passed) - TheRock CI (libraries: hipblaslt, etc.) ✅ - Math CI (performance/hipblaslt) ✅ - Reviews from older PRs are addressed with the few exceptions. ✅ ## PRs History - Original PR: #1859 - Rebased PR: #2718 - Reverted PR: #3416 (comment) - Hot-fix PR: #3417 (review) ([TheRock CI Report](https://github.com/ROCm/TheRock/actions/runs/20293546191)) - **And now this!** ## Technical Details ### Refactor - [x] New file `types.hpp`, consolidates various origami types. - [x] New structs to replace the growing tuple. - [x] API updates to make it scalable, down from 20+ parameters to a few. - [x] Lots of redundant code removal. - [x] Reorganize into `types.hpp`; data types (see `hardware.hpp` for what needs to be moved) - [x] Refactor extract APIs - [x] Add enums for transpose - [x] Refactor debug/log reporting - [x] Remove mutable and statics - [x] Decouple `latency` out of `config_t` - [x] Rebase develop into this branch ### Python APIs & Unit Tests - [x] Update the tests - [x] Update rocroller - [x] hipblaslt/tensilelite/scripts to use the new API ### Testing Infrastructure - Replace YAML-based tests with Catch2 C++ tests - Replace GTest with Catch2 framework - Convert YAML-driven parameterized tests to pure C++ tests - Update common.hpp with direct C++ helper functions - Update CMakeLists.txt to use Catch2 instead of GTest - Remove Boost dependency (was only for YAML) - All test data now hardcoded in C++ for type safety - Better IDE support, debugging, and error messages ### Questions - Rocroller: coordinate on the API design, should they be reused? - Reuse dim3_t from hip_runtime.h (does HIP's dim3 have the same functionality)? - Should transpose/data-types be part of config as well for precomputing? ## Motivation This PR reapplies Origami project's refactor. 1. Make it simpler to make model updates. 2. Well-defined, scalable, clean interface. ## Submission Checklist - [x] Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests. ## Work Moved to Future PRs - Logger APIs - Runtime Options - Move instruction map into separate file - Additional tests: #3301 - Minor batch-count specific changes: #3289 - Move Origami's Azure CI -> TheRock CI @ibrahimw1 --------- Co-authored-by: Brad Nemanich <Brad.Nemanich@amd.com> Co-authored-by: neoblizz <osama94@gmail.com>

Copilot

Pull request overview

This pull request adds comprehensive unit tests for the Origami library to improve code coverage and catch potential failures. The changes focus on testing functions in origami.cpp and gemm.cpp that were previously untested or under-tested.

Key changes include:

Added unit tests for origami functions including rank_configs, select_topk_configs, select_config_mnk, and select_workgroup_mapping
Added unit tests for gemm functions covering work/output utilization, CU occupancy, memory bandwidth, L2 hit rates, arithmetic intensity, LDS capacity checks, and cache estimation
Refactored the make_hardware helper function to use architecture-specific defaults
Exposed previously static functions (compute_cvt_overhead, arithmetic_intensity, etc.) for testing by adding them to the public API
Added zero-denominator checks in arithmetic intensity calculations for safety

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 23 comments.

Show a summary per file

File	Description
shared/origami/tests/test_origami.cpp	Added comprehensive unit tests for rank_configs, select_topk_configs, select_config_mnk, and select_workgroup_mapping; updated existing tests to use refactored make_hardware; corrected test naming from "GEMM" to "origami"
shared/origami/tests/test_gemm.cpp	Added extensive unit tests for gemm functions including calculate_work_utilization, calculate_output_utilization, compute_cu_occupancy, compute_mem_bw_from_occupancy, compute_l2_hit_rate_global, round_elements_to_128B, compute_cvt_overhead, arithmetic_intensity, check_lds_capacity, and estimate_l2_hit/estimate_mall_hit
shared/origami/tests/include/common.hpp	Refactored make_hardware to use architecture-specific default values (942 and 950) instead of parameterized defaults
shared/origami/src/origami/gemm.cpp	Added zero-denominator safety checks in arithmetic_intensity and emulated_tf32_arithmetic_intensity; changed compute_cvt_overhead from static to public
shared/origami/include/origami/gemm.hpp	Added function declarations for previously internal/static functions to enable unit testing; added comprehensive documentation for newly exposed functions; removed declarations for compute_A_loads and compute_B_loads

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

AlexBrownAMD

Flagged a few test cases that look like they contain typos, REQUIRE statements may be testing the wrong values.

math-ci-jobs · 2026-01-06T18:33:48Z

perfci run on commit 136b6b946a46eb0641a111b750f1a0f36fa565d9

math-ci run

math-ci-jobs · 2026-01-06T22:28:31Z

perfci run on commit 3d5542eea7b90b6850d73e263e41fa7e22f79683

math-ci run

ryanswann-amd

I think this is looking good. There are still some key functions a little undertested. Primarily:

compute_memory_latency
compute_tile_latency

Some ideas on things that are undertested. (I came up with these example on the fly so if you find they are wrong feel free to change the specific values)

Compute Memory Latency

Test 1.1: All Transpose Combinations

Test all 4: NN, NT, TN, TT with same problem size (4096×4096×1024)
Config: 256×256×64 tile
Verify: Different MT_M/N/K_rounded_128bytes values based on transpose mode
Expected: Each transpose mode produces different memory access patterns

Test 1.2: MX Block Sizes

Problem: 4096×4096×1024
Test 1: FP8 with a_mx_block_size=32, b_mx_block_size=32
Test 2: FP16 with mx_block_size=0 (no MX blocks)
Verify: FP8 adds scale bytes to Ld_CU_bytes calculation
Expected: MX datatypes have additional overhead for scales

Test 1.3: Problem Size Scaling

Small: 512×512×256, Medium: 2048×2048×1024, Large: 8192×8192×2048
Config: 256×256×64 tile (constant)
Verify: Memory latency increases with problem size
Expected: Large > Medium > Small latency

Test 1.4: Occupancy and Bandwidth Limiting

Problem: 4096×4096×1024
Config: 256×256×64 tile
Test with num_active_cus: 50, 150, 304 (gfx942)
Verify: bw_limited and mem_bw_occ_limited variables scale correctly
Expected: Higher occupancy → better bandwidth utilization → lower latency

Test 1.5: Splitting Factor Impact

Problem: 4096×4096×1024
Config: 256×256×64 tile
Test splitting_factor: 1, 2, 4, 8
Verify: Splitting affects L2 hit rate and memory access patterns
Expected: Different splitting factors produce measurable latency differences

Edge Cases:

Test 1.6: Zero Dimensions

Problem: 0×4096×1024 (or other zero dimension)
Expected: Graceful handling, no crashes

Test 1.7: Single Tile Problem

Problem: 128×128×32 (fits in one tile)
Config: 256×256×64 tile
Expected: Minimal memory latency

Test 1.8: Skinny Matrices

Test 1: 64×8192×1024 (tall and skinny)
Test 2: 8192×64×1024 (wide and short)
Config: 256×256×64 tile
Expected: Different memory access patterns, reasonable latency values

Compute Tile Latency

Test 2.1: Small K (One Iteration)

Problem: 4096×4096×64 (K = MT_K, single iteration)
Config: 256×256×64 tile
Verify: Total latency > single compute iteration (includes prologue/epilogue)
Expected: L_tile_total > L_tile_single due to setup overhead

Test 2.2: K-Split Reduction Overhead

Problem: 4096×4096×512 (small K)
Config: 256×256×64 tile
Test splitting_factor: 1, 2, 8
Verify: When K is small and we split, latency increases due to reduction overhead
Expected: Factor=8 > Factor=2 > Factor=1 (for small K problems)

Test 2.3: Work Utilization Penalty

Problem 1: 4096×4096×1024 (perfect alignment, utilization=1.0)
Problem 2: 4351×3839×959 (poor alignment, utilization≈0.5)
Config: 256×256×64 tile
Verify: Poor alignment increases latency via effective_tile_penalty
Expected: Poor alignment → ~2× higher latency

Test 2.4: K-Dimension Zero Padding

Problem 1: K=1024 (divisible by MT_K=64) → no penalty
Problem 2: K=1025 (K % MT_K = 1, worst case) → maximum penalty
Problem 3: K=1023 (K % MT_K = MT_K-1) → small penalty
Config: 256×256×64 tile
Verify: Padding penalty scales with K % MT_K
Expected: 1023=1024<1025 and $L_{1024}-L_{1023}$ < $L_{1025}-L_{1024}$

Test 2.5: K-Splitting Performance Benefit

Problem: 256×256×4096 (large K, single tile in M/N)
Config: 256×256×64 tile
Test splitting_factor: 1 vs 4
Verify: Large K problem with splitting should be faster (parallelism benefit)
Expected: Split=4 latency < Split=1 latency (for large K)

Test 2.6: Iteration Count Validation

Problem 1: K=64, MT_K=64 → num_iter=1
Problem 2: K=256, MT_K=64 → num_iter=3 (4 total, -1 for epilogue)
Problem 3: K=512, MT_K=64, split=2 → num_iter based on k_per_split
Config: 256×256×64 tile
Verify: Iteration count calculation is correct
Expected: num_iter matches formula: ceil(k_per_split/MT_K) - 1. $L_{K512} > L_{K256}> L_{K64}$

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

math-ci-jobs · 2026-01-13T20:22:39Z

perfci run on commit `1507899`

math-ci run

codecov-commenter · 2026-01-13T23:24:47Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

❗ There is a different number of reports uploaded between BASE (677a8d0) and HEAD (90e3509). Click for more details.

HEAD has 2 uploads less than BASE

Flag BASE (677a8d0) HEAD (90e3509)

hipDNN 1 0

rocBLAS 1 0

Additional details and impacted files

@@             Coverage Diff              @@
##           develop    #3301       +/-   ##
============================================
- Coverage    55.53%   43.62%   -11.91%     
============================================
  Files          521       30      -491     
  Lines        67464    11442    -56022     
  Branches      7949     1440     -6509     
============================================
- Hits         37464     4991    -32473     
+ Misses       26645     5984    -20661     
+ Partials      3355      467     -2888

Flag	Coverage Δ
hipBLASLt	`43.62% <ø> (?)`
hipDNN	`?`
rocBLAS	`?`

Flags with carried forward coverage won't be shown. Click here to find out more.
see 551 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

ryanswann-amd

PR getting big. Tests requested above will be added in another PR.

math-ci-webhook · 2026-01-17T00:37:40Z

perfci run on commit `a1b81a2`

math-ci run

math-ci-webhook · 2026-01-19T20:21:59Z

perfci run on commit `75a5646`

math-ci run

…ranch (#3301) * [CK_TILE] Port hw independent changes from internal repo to develop branch It includes PR#96, #114, #120, #121. * correct rebase error [ROCm/composable_kernel commit: fc7bf0a]

NaveenElumalaiAMD requested a review from a team as a code owner December 10, 2025 23:45

github-actions Bot added project: hipblaslt shared: origami labels Dec 10, 2025

assistant-librarian Bot added the organization: ROCm label Dec 11, 2025

neoblizz requested review from aliry95amd, bethune-bryant, minsukim-amd, ryanswann-amd and yenong-amd December 11, 2025 00:48

ryanswann-amd reviewed Dec 11, 2025

View reviewed changes

minsukim-amd reviewed Dec 11, 2025

View reviewed changes

Comment thread shared/origami/src/origami/gemm.cpp Outdated

Comment thread shared/origami/tests/test_gemm.cpp

Comment thread shared/origami/tests/test_gemm.cpp Outdated

Comment thread shared/origami/tests/test_gemm.cpp Outdated

ryanswann-amd reviewed Dec 12, 2025

View reviewed changes

Comment thread shared/origami/tests/test_gemm.cpp

Comment thread shared/origami/tests/test_gemm.cpp Outdated

bnemanich reviewed Dec 16, 2025

View reviewed changes

Comment thread shared/origami/tests/test_gemm.cpp

bnemanich mentioned this pull request Dec 16, 2025

[Origami] fp4 fix #3407

Closed

1 task

neoblizz mentioned this pull request Dec 17, 2025

[Origami] Reapply the origami refactor with fixes #3452

Merged

15 tasks

neoblizz requested a review from Copilot December 19, 2025 17:45

Copilot started reviewing on behalf of neoblizz December 19, 2025 17:51 View session

Copilot AI reviewed Dec 19, 2025

View reviewed changes

AlexBrownAMD reviewed Dec 19, 2025

View reviewed changes

Comment thread shared/origami/tests/test_gemm.cpp

Comment thread shared/origami/tests/test_gemm.cpp

Comment thread shared/origami/tests/test_gemm.cpp Outdated

Comment thread shared/origami/tests/test_origami.cpp Outdated

Comment thread shared/origami/tests/test_origami.cpp Outdated

ryanswann-amd self-requested a review December 19, 2025 20:24

neoblizz changed the title ~~[Origami] Added more unit tests~~ [Origami] Improve Origami's Test Coverage Dec 19, 2025

NaveenElumalaiAMD force-pushed the users/nelumala/origami/add-new-unit-tests branch from e7248ff to 136b6b9 Compare January 6, 2026 16:50

NaveenElumalaiAMD force-pushed the users/nelumala/origami/add-new-unit-tests branch from 32ada08 to 05b17f3 Compare January 6, 2026 20:44

ryanswann-amd reviewed Jan 9, 2026

View reviewed changes

NaveenElumalaiAMD added 2 commits January 13, 2026 18:38

Added more unit tests

ce8c0cd

Address Ryan and Minsu's comments

8d3cac0

NaveenElumalaiAMD and others added 8 commits January 13, 2026 18:40

Address more comments

d582a13

Fix Inconsistent test numbering

c319a6d

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Fix typo

060118d

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Apply suggestions from code review

fc9ee02

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

get the test build working and passing

ec2ba47

Address further comments

5cb7406

fix typos suggested by Copilot

7aa8a09

add fp64 test

1507899

NaveenElumalaiAMD force-pushed the users/nelumala/origami/add-new-unit-tests branch from 05b17f3 to 1507899 Compare January 13, 2026 18:41

ryanswann-amd self-requested a review January 16, 2026 16:25

ryanswann-amd approved these changes Jan 16, 2026

View reviewed changes

minsukim-amd and others added 3 commits January 16, 2026 14:16

Merge branch 'develop' into users/nelumala/origami/add-new-unit-tests

60a3048

changes to tests after rebase

a1b81a2

Add back some tests in test_gemm.cpp missed in the rebase

75a5646

Merge branch 'develop' into users/nelumala/origami/add-new-unit-tests

90e3509

neoblizz approved these changes Jan 19, 2026

View reviewed changes

bethune-bryant approved these changes Jan 19, 2026

View reviewed changes

aliry95amd approved these changes Jan 19, 2026

View reviewed changes

Comment thread shared/origami/tests/test_origami.cpp

Comment thread shared/origami/tests/test_origami.cpp

amd-hsivasun merged commit 56b3df6 into develop Jan 20, 2026
39 of 40 checks passed

amd-hsivasun deleted the users/nelumala/origami/add-new-unit-tests branch January 20, 2026 20:19

Conversation

NaveenElumalaiAMD commented Dec 10, 2025 • edited by ryanswann-amd Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Technical Details

Test Result

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

AlexBrownAMD left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

math-ci-jobs Bot commented Jan 6, 2026

perfci run on commit 136b6b946a46eb0641a111b750f1a0f36fa565d9

Uh oh!

math-ci-jobs Bot commented Jan 6, 2026

perfci run on commit 3d5542eea7b90b6850d73e263e41fa7e22f79683

Uh oh!

ryanswann-amd left a comment

Choose a reason for hiding this comment

Compute Memory Latency

Edge Cases:

Compute Tile Latency

Uh oh!

math-ci-jobs Bot commented Jan 13, 2026

perfci run on commit 1507899

Uh oh!

codecov-commenter commented Jan 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

ryanswann-amd left a comment

Choose a reason for hiding this comment

Uh oh!

math-ci-webhook Bot commented Jan 17, 2026

perfci run on commit a1b81a2

Uh oh!

math-ci-webhook Bot commented Jan 19, 2026

perfci run on commit 75a5646

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

NaveenElumalaiAMD commented Dec 10, 2025 •

edited by ryanswann-amd

Loading

perfci run on commit `1507899`

codecov-commenter commented Jan 13, 2026 •

edited

Loading

perfci run on commit `a1b81a2`

perfci run on commit `75a5646`