Skip to content

CK: Extract shared boilerplate from 47 gemm_quant test files#6323

Merged
AviralGoelAMD merged 49 commits into
developfrom
users/avirgoel/ck/dedup-gemm-quant-boilerplate
Apr 11, 2026
Merged

CK: Extract shared boilerplate from 47 gemm_quant test files#6323
AviralGoelAMD merged 49 commits into
developfrom
users/avirgoel/ck/dedup-gemm-quant-boilerplate

Conversation

@AviralGoelAMD
Copy link
Copy Markdown
Contributor

@AviralGoelAMD AviralGoelAMD commented Apr 9, 2026

Depends on #6303

Summary

Extract shared test boilerplate (includes, type aliases, test fixture macros) from 47 test_gemm_quant_* files into a single test_gemm_quant_common.hpp header. Each test file is reduced from ~50 lines of boilerplate to ~5 lines.

Metric Value
Files changed 48
Insertions +413
Deletions −1,106
Net lines removed −693

What changed

Before After
47 test files, each with ~50 lines of identical includes, type aliases, and fixture macros 1 shared header (test_gemm_quant_common.hpp) + 47 thin files (~5 lines each: include + params)

Readability assessment

A code realist review confirmed this change improves readability: the 47 test files had identical boilerplate obscuring the only meaningful content — the GemmConfig type alias and test dimensions. After the refactoring, each file's unique configuration is immediately visible, and adding a new test variant requires specifying only the varying parameters instead of copying 50 lines.

Cumulative cleanup series stats

PR Description Net lines
#6300 Remove 61 dead #if 0 blocks −2,648
#6302 Remove 41 commented-out dead code blocks −2,861
#6303 Remove 4 orphaned files −3,886
This PR Extract gemm_quant test boilerplate −693
Total −10,088

The disabled loop applying b_element_op was superseded by the active
code path using ReferenceGemm with PassThrough ops.
…instances

Remove obsolete #if 0 disabled code blocks identified by
automated dead code scan with expert triage.
Remove obsolete #if 0 disabled code blocks identified by
automated dead code scan with expert triage.
Remove obsolete #if 0 disabled code blocks identified by
automated dead code scan with expert triage.
Remove obsolete #if 0 disabled code blocks identified by
automated dead code scan with expert triage.
Remove obsolete #if 0 disabled code blocks identified by
automated dead code scan with expert triage.
Remove obsolete #if 0 disabled code blocks identified by
automated dead code scan with expert triage.
Remove obsolete #if 0 disabled code blocks identified by
automated dead code scan with expert triage.
Remove obsolete #if 0 disabled code blocks identified by
automated dead code scan with expert triage.
Remove obsolete #if 0 disabled code blocks identified by
automated dead code scan with expert triage.
Remove obsolete #if 0 disabled code blocks identified by
automated dead code scan with expert triage.
Remove obsolete #if 0 disabled code blocks identified by
automated dead code scan with expert triage.
Remove obsolete #if 0 disabled code blocks identified by
automated dead code scan with expert triage.
Remove obsolete #if 0 disabled code blocks identified by
automated dead code scan with expert triage.
Remove obsolete #if 0 disabled code blocks identified by
automated dead code scan with expert triage.
Remove obsolete #if 0 disabled code blocks identified by
automated dead code scan with expert triage.
Remove obsolete #if 0 disabled code blocks identified by
automated dead code scan with expert triage.
Remove obsolete #if 0 disabled code blocks identified by
automated dead code scan with expert triage.
…m_b_scale/device_gemm_b_scale_xdl_f16_i4_f16

Remove obsolete #if 0 disabled code blocks identified by
automated dead code scan with expert triage.
…m_universal/device_gemm_xdl_universal_bf16_i4_bf16

Remove obsolete #if 0 disabled code blocks identified by
automated dead code scan with expert triage.
…m_universal/device_gemm_xdl_universal_f16_i4_f16

Remove obsolete #if 0 disabled code blocks identified by
automated dead code scan with expert triage.
Remove obsolete #if 0 disabled code blocks identified by
automated dead code scan with expert triage.
Remove obsolete #if 0 disabled code blocks identified by
automated dead code scan with expert triage.
Remove obsolete #if 0 disabled code blocks identified by
automated dead code scan with expert triage.
Remove obsolete commented-out code blocks identified by
automated dead code scan with expert triage.
Remove obsolete commented-out code blocks identified by
automated dead code scan with expert triage.
Remove obsolete commented-out code blocks identified by
automated dead code scan with expert triage.
Remove obsolete commented-out code blocks identified by
automated dead code scan with expert triage.
Remove obsolete commented-out code blocks identified by
automated dead code scan with expert triage.
Remove obsolete commented-out code blocks identified by
automated dead code scan with expert triage.
Remove obsolete commented-out code blocks identified by
automated dead code scan with expert triage.
Remove obsolete commented-out code blocks identified by
automated dead code scan with expert triage.
Remove obsolete commented-out code blocks identified by
automated dead code scan with expert triage.
Remove obsolete commented-out code blocks identified by
automated dead code scan with expert triage.
Remove obsolete commented-out code blocks identified by
automated dead code scan with expert triage.
Remove obsolete commented-out code blocks identified by
automated dead code scan with expert triage.
Remove obsolete commented-out code blocks identified by
automated dead code scan with expert triage.
Remove obsolete commented-out code blocks identified by
automated dead code scan with expert triage.
…pu/gemm_streamk

Remove obsolete commented-out code blocks identified by
automated dead code scan with expert triage.
Remove obsolete commented-out code blocks identified by
automated dead code scan with expert triage.
Remove obsolete commented-out code blocks identified by
automated dead code scan with expert triage.
- test_gemm_pipeline_compiler.cpp: split into 13 smaller test files
- test_grouped_gemm_quant.cpp: split into 5 smaller test files
- 2 unsplit f8_f8_f16 instance files: superseded by _part1/_part2 splits

Each file verified to have clear replacements already in the build.
Create test_gemm_quant_common.hpp with shared includes, layout aliases,
data type aliases, quant type aliases, and group size aliases that were
copy-pasted across all 47 test_gemm_quant_*.cpp files.

Each test file now includes the common header instead of duplicating
~20 lines of identical boilerplate. Separate .cpp files are preserved
for parallel compilation. No functional changes.

Net reduction: ~733 lines of duplicated code.
Copy link
Copy Markdown
Contributor

@shumway shumway left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, looks good.

The real problem here is that we're testing wrong. This kind of systematic testing should be data-driven, thought the dispatcher, but we don't have all the pieces available yet. I'll keep this in mind and get back to you once we have that capability. (I've tested bquant with the rocm-ck prototype, and I'm planning on adding that to the dispatcher once all that new work has a design review and is in production.)

Base automatically changed from users/avirgoel/ck/remove-unused-files to develop April 10, 2026 15:22
@AviralGoelAMD AviralGoelAMD removed the request for review from vidyasagar-amd April 10, 2026 15:25
AviralGoelAMD added a commit that referenced this pull request Apr 10, 2026
…N_INSTANCE macro (#6325)

Depends on #6324

## Summary

Refactor all 144 contraction library instance `.cpp` files (bilinear +
scale, 2D/6D, f32/f64/bf16/cf32/cf64) to use a shared
`CK_CONTRACTION_INSTANCE` macro defined in
`contraction_instance_common.hpp`. Each 58-line file is reduced to 12
lines — zero unique logic, pure parameterization.

| Metric | Value |
|--------|-------|
| Files changed | 145 (144 instances + 1 new shared header) |
| Insertions | +1,373 |
| Deletions | −7,890 |
| **Net lines removed** | **−6,517** |

### What changed

| Before | After |
|--------|-------|
| 144 instance files, each ~58 lines of identical boilerplate (includes,
namespace, type alias, registration function) differing only in 12
template parameters | 1 shared macro header + 144 files at ~12 lines
each (copyright + include + macro invocation) |

### Macro parameters (12)

```
CK_CONTRACTION_INSTANCE(INST_TPL, OP_NAME, CDE_OP, NDIM_VAL, NAME_SUFFIX,
    ADATA, BDATA, ACC, CSHUFFLE, DS_TUPLE, EDATA, COMPUTE)
```

| Parameter | Example | Purpose |
|-----------|---------|---------|
| `INST_TPL` | `device_contraction_kk_instance` | Device template to
instantiate |
| `OP_NAME` | `bilinear` | Lowercase, used in `##` token pasting for
function/type names |
| `CDE_OP` | `Bilinear` | C++ type name (capitalized) for template args
|
| `NDIM_VAL` | `2` or `6` | Number of dimensions |
| `NAME_SUFFIX` | `f32_f32_f32_f32_kknn` | Data type + layout suffix |
| `ADATA..COMPUTE` | `F32, F32, ...` | Template type arguments |

### Readability assessment

A code realist review confirmed this change **improves readability**:
the original 58-line files contained zero unique logic — just mechanical
boilerplate wrappers that varied only in 12 template parameters. After
the macro, each file's intent is immediately clear from a single macro
call, and the 12 parameters serve as a concise specification of what the
instance does (data types, layout, operation). Adding a new contraction
instance requires writing 1 line instead of copying and modifying 58
lines. The realist also noted that the area has very low activity (1
functional commit in 18 months), so merge conflict risk is negligible.

### Cumulative cleanup series stats

| PR | Description | Net lines |
|----|-------------|-----------|
| #6300 | Remove 61 dead `#if 0` blocks | −2,648 |
| #6302 | Remove 41 commented-out dead code blocks | −2,861 |
| #6303 | Remove 4 orphaned files | −3,886 |
| #6323 | Extract gemm_quant test boilerplate | −693 |
| #6324 | Extract contraction example boilerplate | −1,016 |
| This PR | Refactor 144 contraction instance files | −6,517 |
| **Total** | | **−17,621** |
…ro (#6324)

Depends on #6323

## Summary

Extract repeated `DeviceOpInstance` type alias boilerplate from 20
contraction example files into a single macro
`CK_CONTRACTION_DEVICE_OP_INSTANCES(BASE, SUFFIX)` in
`common_instances.hpp`. Each example file's 4 device-op-instance blocks
(~56 lines) are replaced by 4 one-line macro calls.

| Metric | Value |
|--------|-------|
| Files changed | 22 |
| Insertions | +124 |
| Deletions | −1,140 |
| **Net lines removed** | **−1,016** |

### What changed

| Before | After |
|--------|-------|
| 20 example files, each with 4 identical `DeviceOpInstance` type alias
blocks (~14 lines each) | 1 macro definition in `common_instances.hpp` +
20 files with 4 one-line macro calls each |

### Readability assessment

A code realist review confirmed this change **improves readability**:
the 14-line `DeviceOpInstance` blocks were pure noise — identical across
all 20 files and obscuring the actual example logic (data types, element
operations). After the macro, each file's intent is immediately clear
from the 4 one-liner macro calls, and a developer adding a new
contraction example only needs to specify the varying parameters instead
of copying 56 lines of boilerplate.

### GPU verification

All 20 contraction examples verified on MI300X: **20/20 passed**.

### Cumulative cleanup series stats

| PR | Description | Net lines |
|----|-------------|-----------|
| #6300 | Remove 61 dead `#if 0` blocks | −2,648 |
| #6302 | Remove 41 commented-out dead code blocks | −2,861 |
| #6303 | Remove 4 orphaned files | −3,886 |
| #6323 | Extract gemm_quant test boilerplate | −693 |
| This PR | Extract contraction example boilerplate | −1,016 |
| **Total** | | **−11,104** |
@AviralGoelAMD AviralGoelAMD merged commit a668483 into develop Apr 11, 2026
32 checks passed
@AviralGoelAMD AviralGoelAMD deleted the users/avirgoel/ck/dedup-gemm-quant-boilerplate branch April 11, 2026 10:00
assistant-librarian Bot pushed a commit to ROCm/composable_kernel that referenced this pull request Apr 11, 2026
CK: Extract shared boilerplate from 47 gemm_quant test files
 (#6323)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Depends on #6303

## Summary

Extract shared test boilerplate (includes, type aliases, test fixture
macros) from 47 `test_gemm_quant_*` files into a single
`test_gemm_quant_common.hpp` header. Each test file is reduced from ~50
lines of boilerplate to ~5 lines.

| Metric | Value |
|--------|-------|
| Files changed | 48 |
| Insertions | +413 |
| Deletions | −1,106 |
| **Net lines removed** | **−693** |

### What changed

| Before | After |
|--------|-------|
| 47 test files, each with ~50 lines of identical includes, type
aliases, and fixture macros | 1 shared header
(`test_gemm_quant_common.hpp`) + 47 thin files (~5 lines each: include +
params) |

### Readability assessment

A code realist review confirmed this change **improves readability**:
the 47 test files had identical boilerplate obscuring the only
meaningful content — the `GemmConfig` type alias and test dimensions.
After the refactoring, each file's unique configuration is immediately
visible, and adding a new test variant requires specifying only the
varying parameters instead of copying 50 lines.

### Cumulative cleanup series stats

| PR | Description | Net lines |
|----|-------------|-----------|
| #6300 | Remove 61 dead `#if 0` blocks | −2,648 |
| #6302 | Remove 41 commented-out dead code blocks | −2,861 |
| #6303 | Remove 4 orphaned files | −3,886 |
| This PR | Extract gemm_quant test boilerplate | −693 |
| **Total** | | **−10,088** |
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants