Skip to content

[Origami] Improve Origami's Test Coverage#3301

Merged
amd-hsivasun merged 14 commits into
developfrom
users/nelumala/origami/add-new-unit-tests
Jan 20, 2026
Merged

[Origami] Improve Origami's Test Coverage#3301
amd-hsivasun merged 14 commits into
developfrom
users/nelumala/origami/add-new-unit-tests

Conversation

@NaveenElumalaiAMD
Copy link
Copy Markdown
Contributor

@NaveenElumalaiAMD NaveenElumalaiAMD commented Dec 10, 2025

Motivation

Expanded unit testing and improved analytical modeling coverage for Origami GEMM module. Enhanced documentation and robustness of selection/ranking utilities.

Technical Details

  • Added new analytical utility functions to origami/gemm.hpp for modeling work/output utilization, CU occupancy, memory bandwidth, L2 hit rate, and arithmetic intensity. Improved Doxygen documentation and removed obsolete API.
  • Provided safe-guards in mathematical routines (e.g., division-by-zero checks).
  • Refactored helpers (make_hardware, make_config) to ensure robust and architecture-specific test setups.
  • Significantly expanded unit tests in test_origami.cpp to:
    • Exercise new utility/modeling functions and all selection/ranking APIs.
    • Validate tie-breaking, invalid configs, edge-case and exception handling, environmental settings, WGM logic, and order determinism.
    • Rename/clean up test case names for clarity and consistency.
  • Minor code and comment cleanups throughout.

Test Result

All unit tests pass locally.

Comment thread shared/origami/tests/test_gemm.cpp
Comment thread shared/origami/tests/test_gemm.cpp
Comment thread shared/origami/tests/test_gemm.cpp
Comment thread shared/origami/tests/test_gemm.cpp
Comment thread shared/origami/tests/test_gemm.cpp
Comment thread shared/origami/tests/test_gemm.cpp
Comment thread shared/origami/tests/test_gemm.cpp
Comment thread shared/origami/tests/test_gemm.cpp
Comment thread shared/origami/tests/test_origami.cpp
Comment thread shared/origami/tests/test_origami.cpp
Comment thread shared/origami/src/origami/gemm.cpp Outdated
Comment thread shared/origami/tests/test_gemm.cpp
Comment thread shared/origami/tests/test_gemm.cpp Outdated
Comment thread shared/origami/tests/test_gemm.cpp Outdated
assistant-librarian Bot pushed a commit that referenced this pull request Dec 12, 2025
…ranch (#3301)

* [CK_TILE] Port hw independent changes from internal repo to develop branch

It includes PR#96, #114, #120, #121.

* correct rebase error
Comment thread shared/origami/tests/test_gemm.cpp
Comment thread shared/origami/tests/test_gemm.cpp Outdated
Comment thread shared/origami/tests/test_gemm.cpp
@bnemanich bnemanich mentioned this pull request Dec 16, 2025
1 task
minsukim-amd added a commit that referenced this pull request Dec 18, 2025
## Testing Plan
- Azure CI (deprecated): Runs all the Catch2 and new Python tests. ✅ 
- TheRock CI (integration): 
- CI workflow: https://github.com/ROCm/TheRock/actions/runs/20316557318
✅
- TheRock branch:
https://github.com/ROCm/TheRock/tree/users/neoblizz/minsukim-refactor ✅
(Some tests fail but build passed)
- TheRock CI (libraries: hipblaslt, etc.) ✅ 
- Math CI (performance/hipblaslt) ✅ 
- Reviews from older PRs are addressed with the few exceptions. ✅ 

## PRs History
- Original PR: #1859
- Rebased PR: #2718
- Reverted PR:
#3416 (comment)
- Hot-fix PR:
#3417 (review)
([TheRock CI
Report](https://github.com/ROCm/TheRock/actions/runs/20293546191))
- **And now this!**

## Technical Details

### Refactor
- [x] New file `types.hpp`, consolidates various origami types.
- [x] New structs to replace the growing tuple.
- [x] API updates to make it scalable, down from 20+ parameters to a
few.
- [x] Lots of redundant code removal.
- [x] Reorganize into `types.hpp`; data types (see `hardware.hpp` for
what needs to be moved)
- [x] Refactor extract APIs
- [x] Add enums for transpose
- [x] Refactor debug/log reporting
- [x] Remove mutable and statics
- [x] Decouple `latency` out of `config_t` 
- [x] Rebase develop into this branch

### Python APIs & Unit Tests
- [x] Update the tests
- [x] Update rocroller
- [x] hipblaslt/tensilelite/scripts to use the new API

### Testing Infrastructure
- Replace YAML-based tests with Catch2 C++ tests
- Replace GTest with Catch2 framework
- Convert YAML-driven parameterized tests to pure C++ tests
- Update common.hpp with direct C++ helper functions
- Update CMakeLists.txt to use Catch2 instead of GTest
- Remove Boost dependency (was only for YAML)
- All test data now hardcoded in C++ for type safety
- Better IDE support, debugging, and error messages

### Questions
- Rocroller: coordinate on the API design, should they be reused?
- Reuse dim3_t from hip_runtime.h (does HIP's dim3 have the same
functionality)?
- Should transpose/data-types be part of config as well for
precomputing?

## Motivation

This PR reapplies Origami project's refactor.

1. Make it simpler to make model updates.
2. Well-defined, scalable, clean interface.

## Submission Checklist

- [x] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

## Work Moved to Future PRs

- Logger APIs
- Runtime Options
- Move instruction map into separate file
- Additional tests: #3301
- Minor batch-count specific changes:
#3289
- Move Origami's Azure CI -> TheRock CI @ibrahimw1

---------

Co-authored-by: Brad Nemanich <Brad.Nemanich@amd.com>
Co-authored-by: neoblizz <osama94@gmail.com>
@neoblizz neoblizz requested a review from Copilot December 19, 2025 17:45
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request adds comprehensive unit tests for the Origami library to improve code coverage and catch potential failures. The changes focus on testing functions in origami.cpp and gemm.cpp that were previously untested or under-tested.

Key changes include:

  • Added unit tests for origami functions including rank_configs, select_topk_configs, select_config_mnk, and select_workgroup_mapping
  • Added unit tests for gemm functions covering work/output utilization, CU occupancy, memory bandwidth, L2 hit rates, arithmetic intensity, LDS capacity checks, and cache estimation
  • Refactored the make_hardware helper function to use architecture-specific defaults
  • Exposed previously static functions (compute_cvt_overhead, arithmetic_intensity, etc.) for testing by adding them to the public API
  • Added zero-denominator checks in arithmetic intensity calculations for safety

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 23 comments.

Show a summary per file
File Description
shared/origami/tests/test_origami.cpp Added comprehensive unit tests for rank_configs, select_topk_configs, select_config_mnk, and select_workgroup_mapping; updated existing tests to use refactored make_hardware; corrected test naming from "GEMM" to "origami"
shared/origami/tests/test_gemm.cpp Added extensive unit tests for gemm functions including calculate_work_utilization, calculate_output_utilization, compute_cu_occupancy, compute_mem_bw_from_occupancy, compute_l2_hit_rate_global, round_elements_to_128B, compute_cvt_overhead, arithmetic_intensity, check_lds_capacity, and estimate_l2_hit/estimate_mall_hit
shared/origami/tests/include/common.hpp Refactored make_hardware to use architecture-specific default values (942 and 950) instead of parameterized defaults
shared/origami/src/origami/gemm.cpp Added zero-denominator safety checks in arithmetic_intensity and emulated_tf32_arithmetic_intensity; changed compute_cvt_overhead from static to public
shared/origami/include/origami/gemm.hpp Added function declarations for previously internal/static functions to enable unit testing; added comprehensive documentation for newly exposed functions; removed declarations for compute_A_loads and compute_B_loads

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread shared/origami/tests/test_gemm.cpp Outdated
Comment thread shared/origami/tests/test_origami.cpp Outdated
Comment thread shared/origami/tests/test_origami.cpp Outdated
Comment thread shared/origami/tests/test_origami.cpp Outdated
Comment thread shared/origami/tests/include/common.hpp Outdated
Comment thread shared/origami/tests/test_origami.cpp Outdated
Comment thread shared/origami/tests/test_origami.cpp Outdated
Comment thread shared/origami/tests/test_origami.cpp Outdated
Comment thread shared/origami/tests/test_gemm.cpp Outdated
Comment thread shared/origami/tests/test_gemm.cpp Outdated
Copy link
Copy Markdown
Contributor

@AlexBrownAMD AlexBrownAMD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Flagged a few test cases that look like they contain typos, REQUIRE statements may be testing the wrong values.

Comment thread shared/origami/tests/test_gemm.cpp
Comment thread shared/origami/tests/test_gemm.cpp
Comment thread shared/origami/tests/test_gemm.cpp Outdated
Comment thread shared/origami/tests/test_origami.cpp Outdated
Comment thread shared/origami/tests/test_origami.cpp Outdated
@ryanswann-amd ryanswann-amd self-requested a review December 19, 2025 20:24
@neoblizz neoblizz changed the title [Origami] Added more unit tests [Origami] Improve Origami's Test Coverage Dec 19, 2025
@NaveenElumalaiAMD NaveenElumalaiAMD force-pushed the users/nelumala/origami/add-new-unit-tests branch from e7248ff to 136b6b9 Compare January 6, 2026 16:50
@math-ci-jobs
Copy link
Copy Markdown

math-ci-jobs Bot commented Jan 6, 2026

perfci run on commit 136b6b946a46eb0641a111b750f1a0f36fa565d9

math-ci run

@NaveenElumalaiAMD NaveenElumalaiAMD force-pushed the users/nelumala/origami/add-new-unit-tests branch from 32ada08 to 05b17f3 Compare January 6, 2026 20:44
@math-ci-jobs
Copy link
Copy Markdown

math-ci-jobs Bot commented Jan 6, 2026

perfci run on commit 3d5542eea7b90b6850d73e263e41fa7e22f79683

math-ci run

Copy link
Copy Markdown
Contributor

@ryanswann-amd ryanswann-amd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is looking good. There are still some key functions a little undertested. Primarily:

  1. compute_memory_latency
  2. compute_tile_latency

Some ideas on things that are undertested. (I came up with these example on the fly so if you find they are wrong feel free to change the specific values)

Compute Memory Latency

Test 1.1: All Transpose Combinations

  • Test all 4: NN, NT, TN, TT with same problem size (4096×4096×1024)
  • Config: 256×256×64 tile
  • Verify: Different MT_M/N/K_rounded_128bytes values based on transpose mode
  • Expected: Each transpose mode produces different memory access patterns

Test 1.2: MX Block Sizes

  • Problem: 4096×4096×1024
  • Test 1: FP8 with a_mx_block_size=32, b_mx_block_size=32
  • Test 2: FP16 with mx_block_size=0 (no MX blocks)
  • Verify: FP8 adds scale bytes to Ld_CU_bytes calculation
  • Expected: MX datatypes have additional overhead for scales

Test 1.3: Problem Size Scaling

  • Small: 512×512×256, Medium: 2048×2048×1024, Large: 8192×8192×2048
  • Config: 256×256×64 tile (constant)
  • Verify: Memory latency increases with problem size
  • Expected: Large > Medium > Small latency

Test 1.4: Occupancy and Bandwidth Limiting

  • Problem: 4096×4096×1024
  • Config: 256×256×64 tile
  • Test with num_active_cus: 50, 150, 304 (gfx942)
  • Verify: bw_limited and mem_bw_occ_limited variables scale correctly
  • Expected: Higher occupancy → better bandwidth utilization → lower latency

Test 1.5: Splitting Factor Impact

  • Problem: 4096×4096×1024
  • Config: 256×256×64 tile
  • Test splitting_factor: 1, 2, 4, 8
  • Verify: Splitting affects L2 hit rate and memory access patterns
  • Expected: Different splitting factors produce measurable latency differences

Edge Cases:

Test 1.6: Zero Dimensions

  • Problem: 0×4096×1024 (or other zero dimension)
  • Expected: Graceful handling, no crashes

Test 1.7: Single Tile Problem

  • Problem: 128×128×32 (fits in one tile)
  • Config: 256×256×64 tile
  • Expected: Minimal memory latency

Test 1.8: Skinny Matrices

  • Test 1: 64×8192×1024 (tall and skinny)
  • Test 2: 8192×64×1024 (wide and short)
  • Config: 256×256×64 tile
  • Expected: Different memory access patterns, reasonable latency values

Compute Tile Latency

Test 2.1: Small K (One Iteration)

  • Problem: 4096×4096×64 (K = MT_K, single iteration)
  • Config: 256×256×64 tile
  • Verify: Total latency > single compute iteration (includes prologue/epilogue)
  • Expected: L_tile_total > L_tile_single due to setup overhead

Test 2.2: K-Split Reduction Overhead

  • Problem: 4096×4096×512 (small K)
  • Config: 256×256×64 tile
  • Test splitting_factor: 1, 2, 8
  • Verify: When K is small and we split, latency increases due to reduction overhead
  • Expected: Factor=8 > Factor=2 > Factor=1 (for small K problems)

Test 2.3: Work Utilization Penalty

  • Problem 1: 4096×4096×1024 (perfect alignment, utilization=1.0)
  • Problem 2: 4351×3839×959 (poor alignment, utilization≈0.5)
  • Config: 256×256×64 tile
  • Verify: Poor alignment increases latency via effective_tile_penalty
  • Expected: Poor alignment → ~2× higher latency

Test 2.4: K-Dimension Zero Padding

  • Problem 1: K=1024 (divisible by MT_K=64) → no penalty
  • Problem 2: K=1025 (K % MT_K = 1, worst case) → maximum penalty
  • Problem 3: K=1023 (K % MT_K = MT_K-1) → small penalty
  • Config: 256×256×64 tile
  • Verify: Padding penalty scales with K % MT_K
  • Expected: 1023=1024<1025 and $L_{1024}-L_{1023}$ < $L_{1025}-L_{1024}$

Test 2.5: K-Splitting Performance Benefit

  • Problem: 256×256×4096 (large K, single tile in M/N)
  • Config: 256×256×64 tile
  • Test splitting_factor: 1 vs 4
  • Verify: Large K problem with splitting should be faster (parallelism benefit)
  • Expected: Split=4 latency < Split=1 latency (for large K)

Test 2.6: Iteration Count Validation

  • Problem 1: K=64, MT_K=64 → num_iter=1
  • Problem 2: K=256, MT_K=64 → num_iter=3 (4 total, -1 for epilogue)
  • Problem 3: K=512, MT_K=64, split=2 → num_iter based on k_per_split
  • Config: 256×256×64 tile
  • Verify: Iteration count calculation is correct
  • Expected: num_iter matches formula: ceil(k_per_split/MT_K) - 1. $L_{K512} &gt; L_{K256}&gt; L_{K64}$

NaveenElumalaiAMD and others added 8 commits January 13, 2026 18:40
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@NaveenElumalaiAMD NaveenElumalaiAMD force-pushed the users/nelumala/origami/add-new-unit-tests branch from 05b17f3 to 1507899 Compare January 13, 2026 18:41
@math-ci-jobs
Copy link
Copy Markdown

math-ci-jobs Bot commented Jan 13, 2026

perfci run on commit 1507899

math-ci run

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Jan 13, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

❗ There is a different number of reports uploaded between BASE (677a8d0) and HEAD (90e3509). Click for more details.

HEAD has 2 uploads less than BASE
Flag BASE (677a8d0) HEAD (90e3509)
hipDNN 1 0
rocBLAS 1 0
Additional details and impacted files
@@             Coverage Diff              @@
##           develop    #3301       +/-   ##
============================================
- Coverage    55.53%   43.62%   -11.91%     
============================================
  Files          521       30      -491     
  Lines        67464    11442    -56022     
  Branches      7949     1440     -6509     
============================================
- Hits         37464     4991    -32473     
+ Misses       26645     5984    -20661     
+ Partials      3355      467     -2888     
Flag Coverage Δ
hipBLASLt 43.62% <ø> (?)
hipDNN ?
rocBLAS ?

Flags with carried forward coverage won't be shown. Click here to find out more.
see 551 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@ryanswann-amd ryanswann-amd self-requested a review January 16, 2026 16:25
Copy link
Copy Markdown
Contributor

@ryanswann-amd ryanswann-amd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR getting big. Tests requested above will be added in another PR.

@math-ci-webhook
Copy link
Copy Markdown

perfci run on commit a1b81a2

math-ci run

@math-ci-webhook
Copy link
Copy Markdown

perfci run on commit 75a5646

math-ci run

Comment thread shared/origami/tests/test_origami.cpp
Comment thread shared/origami/tests/test_origami.cpp
@amd-hsivasun amd-hsivasun merged commit 56b3df6 into develop Jan 20, 2026
39 of 40 checks passed
@amd-hsivasun amd-hsivasun deleted the users/nelumala/origami/add-new-unit-tests branch January 20, 2026 20:19
ammallya pushed a commit that referenced this pull request Feb 3, 2026
…ranch (#3301)

* [CK_TILE] Port hw independent changes from internal repo to develop branch

It includes PR#96, #114, #120, #121.

* correct rebase error

[ROCm/composable_kernel commit: fc7bf0a]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.