Skip to content

CK: Remove 61 dead #if 0 code blocks (~2,600 lines)#6300

Merged
AviralGoelAMD merged 25 commits into
developfrom
users/avirgoel/ck/remove-dead-if0-blocks
Apr 13, 2026
Merged

CK: Remove 61 dead #if 0 code blocks (~2,600 lines)#6300
AviralGoelAMD merged 25 commits into
developfrom
users/avirgoel/ck/remove-dead-if0-blocks

Conversation

@AviralGoelAMD
Copy link
Copy Markdown
Contributor

Summary

Remove 61 confirmed-dead #if 0 code blocks across 52 files in Composable Kernel, totaling ~2,600 lines of dead code.

These blocks were identified using an automated dead code scanning skill (ck-dead-code) that:

  1. Scanned all 5,279 source files across 639 directories for #if 0 blocks (found 187)
  2. Triaged each block with an LLM expert that reads the code in context and judges REMOVE vs KEEP based on whether the block is genuinely obsolete or intentionally disabled (debug helpers, alternative configs, compiler workarounds, planned features)
  3. Kept 126 blocks (67%) that serve a legitimate purpose — quick-toggle configs, debug prints, compiler workarounds, planned features with TODOs
  4. Removed 61 blocks (33%) that are genuinely dead — obsolete implementations replaced by better approaches, stale template signatures that would no longer compile, buggy code with operator precedence errors, unreachable code after return statements

What was removed

Category Count Example
Obsolete implementations replaced by better approach 35 Scalar transpose replaced by __builtin_amdgcn_perm intrinsic
Stale template signatures (would not compile) 12 Old trait_ missing quantization type parameter
Dead per-specialization padding branches 10 Replaced by unconditional pad-both approach
Buggy dead code 2 Operator precedence errors in #if 0 branches
Unreachable code 1 Code after unconditional return
Abandoned stubs 1 Empty template alias with no body

What was intentionally kept (not in this PR)

126 #if 0 blocks were triaged as KEEP:

  • 42 alternative configurations (tile sizes, data types, pipeline toggles)
  • 22 debug/diagnostic helpers (printf, std::cout, show_*)
  • 20 planned features with TODO comments
  • 18 compiler workarounds (rocm version-specific, constexpr issues)
  • 12 partially completed features with FIXME
  • 10 reference implementations kept for validation
  • 2 hardware-specific workarounds

The disabled loop applying b_element_op was superseded by the active
code path using ReferenceGemm with PassThrough ops.
…instances

Remove obsolete #if 0 disabled code blocks identified by
automated dead code scan with expert triage.
Remove obsolete #if 0 disabled code blocks identified by
automated dead code scan with expert triage.
Remove obsolete #if 0 disabled code blocks identified by
automated dead code scan with expert triage.
Remove obsolete #if 0 disabled code blocks identified by
automated dead code scan with expert triage.
Remove obsolete #if 0 disabled code blocks identified by
automated dead code scan with expert triage.
Remove obsolete #if 0 disabled code blocks identified by
automated dead code scan with expert triage.
Remove obsolete #if 0 disabled code blocks identified by
automated dead code scan with expert triage.
Remove obsolete #if 0 disabled code blocks identified by
automated dead code scan with expert triage.
Remove obsolete #if 0 disabled code blocks identified by
automated dead code scan with expert triage.
Remove obsolete #if 0 disabled code blocks identified by
automated dead code scan with expert triage.
Remove obsolete #if 0 disabled code blocks identified by
automated dead code scan with expert triage.
Remove obsolete #if 0 disabled code blocks identified by
automated dead code scan with expert triage.
Remove obsolete #if 0 disabled code blocks identified by
automated dead code scan with expert triage.
Remove obsolete #if 0 disabled code blocks identified by
automated dead code scan with expert triage.
Remove obsolete #if 0 disabled code blocks identified by
automated dead code scan with expert triage.
Remove obsolete #if 0 disabled code blocks identified by
automated dead code scan with expert triage.
Remove obsolete #if 0 disabled code blocks identified by
automated dead code scan with expert triage.
…m_b_scale/device_gemm_b_scale_xdl_f16_i4_f16

Remove obsolete #if 0 disabled code blocks identified by
automated dead code scan with expert triage.
…m_universal/device_gemm_xdl_universal_bf16_i4_bf16

Remove obsolete #if 0 disabled code blocks identified by
automated dead code scan with expert triage.
…m_universal/device_gemm_xdl_universal_f16_i4_f16

Remove obsolete #if 0 disabled code blocks identified by
automated dead code scan with expert triage.
Remove obsolete #if 0 disabled code blocks identified by
automated dead code scan with expert triage.
Remove obsolete #if 0 disabled code blocks identified by
automated dead code scan with expert triage.
Remove obsolete #if 0 disabled code blocks identified by
automated dead code scan with expert triage.
@AviralGoelAMD AviralGoelAMD requested a review from bartekxk April 10, 2026 02:11
AviralGoelAMD added a commit that referenced this pull request Apr 10, 2026
Depends on #6300 

## Summary

Remove 41 commented-out code blocks across 33 files in Composable
Kernel, totaling ~200 lines.

Identified using an automated dead code scanning skill (`ck-dead-code`)
with a calibrated two-stage pipeline:
1. **Pre-filter**: Keyword-based scan found 1,338 `//`-commented blocks.
Calibrated heuristics (trained on 50-sample expert classification)
reduced to 89 high-confidence candidates — 93% noise reduction.
2. **Expert triage**: LLM expert classified each block in context as
CODE_REMOVE, CODE_KEEP, or NOT_CODE.

| Classification | Count |
|---------------|-------|
| Removed (this PR) | 41 |
| Kept (debug helpers, alt configs, reference impls) | 32 |
| Not code (false positives) | 16 |

Removed blocks include: superseded implementations, old test data,
abandoned stubs, unreachable code, and buggy dead code.
@AviralGoelAMD AviralGoelAMD removed the request for review from bartekxk April 10, 2026 15:29
AviralGoelAMD added a commit that referenced this pull request Apr 10, 2026
…N_INSTANCE macro (#6325)

Depends on #6324

## Summary

Refactor all 144 contraction library instance `.cpp` files (bilinear +
scale, 2D/6D, f32/f64/bf16/cf32/cf64) to use a shared
`CK_CONTRACTION_INSTANCE` macro defined in
`contraction_instance_common.hpp`. Each 58-line file is reduced to 12
lines — zero unique logic, pure parameterization.

| Metric | Value |
|--------|-------|
| Files changed | 145 (144 instances + 1 new shared header) |
| Insertions | +1,373 |
| Deletions | −7,890 |
| **Net lines removed** | **−6,517** |

### What changed

| Before | After |
|--------|-------|
| 144 instance files, each ~58 lines of identical boilerplate (includes,
namespace, type alias, registration function) differing only in 12
template parameters | 1 shared macro header + 144 files at ~12 lines
each (copyright + include + macro invocation) |

### Macro parameters (12)

```
CK_CONTRACTION_INSTANCE(INST_TPL, OP_NAME, CDE_OP, NDIM_VAL, NAME_SUFFIX,
    ADATA, BDATA, ACC, CSHUFFLE, DS_TUPLE, EDATA, COMPUTE)
```

| Parameter | Example | Purpose |
|-----------|---------|---------|
| `INST_TPL` | `device_contraction_kk_instance` | Device template to
instantiate |
| `OP_NAME` | `bilinear` | Lowercase, used in `##` token pasting for
function/type names |
| `CDE_OP` | `Bilinear` | C++ type name (capitalized) for template args
|
| `NDIM_VAL` | `2` or `6` | Number of dimensions |
| `NAME_SUFFIX` | `f32_f32_f32_f32_kknn` | Data type + layout suffix |
| `ADATA..COMPUTE` | `F32, F32, ...` | Template type arguments |

### Readability assessment

A code realist review confirmed this change **improves readability**:
the original 58-line files contained zero unique logic — just mechanical
boilerplate wrappers that varied only in 12 template parameters. After
the macro, each file's intent is immediately clear from a single macro
call, and the 12 parameters serve as a concise specification of what the
instance does (data types, layout, operation). Adding a new contraction
instance requires writing 1 line instead of copying and modifying 58
lines. The realist also noted that the area has very low activity (1
functional commit in 18 months), so merge conflict risk is negligible.

### Cumulative cleanup series stats

| PR | Description | Net lines |
|----|-------------|-----------|
| #6300 | Remove 61 dead `#if 0` blocks | −2,648 |
| #6302 | Remove 41 commented-out dead code blocks | −2,861 |
| #6303 | Remove 4 orphaned files | −3,886 |
| #6323 | Extract gemm_quant test boilerplate | −693 |
| #6324 | Extract contraction example boilerplate | −1,016 |
| This PR | Refactor 144 contraction instance files | −6,517 |
| **Total** | | **−17,621** |
AviralGoelAMD added a commit that referenced this pull request Apr 11, 2026
…ro (#6324)

Depends on #6323

## Summary

Extract repeated `DeviceOpInstance` type alias boilerplate from 20
contraction example files into a single macro
`CK_CONTRACTION_DEVICE_OP_INSTANCES(BASE, SUFFIX)` in
`common_instances.hpp`. Each example file's 4 device-op-instance blocks
(~56 lines) are replaced by 4 one-line macro calls.

| Metric | Value |
|--------|-------|
| Files changed | 22 |
| Insertions | +124 |
| Deletions | −1,140 |
| **Net lines removed** | **−1,016** |

### What changed

| Before | After |
|--------|-------|
| 20 example files, each with 4 identical `DeviceOpInstance` type alias
blocks (~14 lines each) | 1 macro definition in `common_instances.hpp` +
20 files with 4 one-line macro calls each |

### Readability assessment

A code realist review confirmed this change **improves readability**:
the 14-line `DeviceOpInstance` blocks were pure noise — identical across
all 20 files and obscuring the actual example logic (data types, element
operations). After the macro, each file's intent is immediately clear
from the 4 one-liner macro calls, and a developer adding a new
contraction example only needs to specify the varying parameters instead
of copying 56 lines of boilerplate.

### GPU verification

All 20 contraction examples verified on MI300X: **20/20 passed**.

### Cumulative cleanup series stats

| PR | Description | Net lines |
|----|-------------|-----------|
| #6300 | Remove 61 dead `#if 0` blocks | −2,648 |
| #6302 | Remove 41 commented-out dead code blocks | −2,861 |
| #6303 | Remove 4 orphaned files | −3,886 |
| #6323 | Extract gemm_quant test boilerplate | −693 |
| This PR | Extract contraction example boilerplate | −1,016 |
| **Total** | | **−11,104** |
AviralGoelAMD added a commit that referenced this pull request Apr 11, 2026
Depends on #6303

## Summary

Extract shared test boilerplate (includes, type aliases, test fixture
macros) from 47 `test_gemm_quant_*` files into a single
`test_gemm_quant_common.hpp` header. Each test file is reduced from ~50
lines of boilerplate to ~5 lines.

| Metric | Value |
|--------|-------|
| Files changed | 48 |
| Insertions | +413 |
| Deletions | −1,106 |
| **Net lines removed** | **−693** |

### What changed

| Before | After |
|--------|-------|
| 47 test files, each with ~50 lines of identical includes, type
aliases, and fixture macros | 1 shared header
(`test_gemm_quant_common.hpp`) + 47 thin files (~5 lines each: include +
params) |

### Readability assessment

A code realist review confirmed this change **improves readability**:
the 47 test files had identical boilerplate obscuring the only
meaningful content — the `GemmConfig` type alias and test dimensions.
After the refactoring, each file's unique configuration is immediately
visible, and adding a new test variant requires specifying only the
varying parameters instead of copying 50 lines.

### Cumulative cleanup series stats

| PR | Description | Net lines |
|----|-------------|-----------|
| #6300 | Remove 61 dead `#if 0` blocks | −2,648 |
| #6302 | Remove 41 commented-out dead code blocks | −2,861 |
| #6303 | Remove 4 orphaned files | −3,886 |
| This PR | Extract gemm_quant test boilerplate | −693 |
| **Total** | | **−10,088** |
@AviralGoelAMD AviralGoelAMD merged commit 41373e7 into develop Apr 13, 2026
18 checks passed
@AviralGoelAMD AviralGoelAMD deleted the users/avirgoel/ck/remove-dead-if0-blocks branch April 13, 2026 17:17
aledudek pushed a commit that referenced this pull request May 20, 2026
## Summary

Remove 61 confirmed-dead `#if 0` code blocks across 52 files in
Composable Kernel, totaling ~2,600 lines of dead code.

These blocks were identified using an automated dead code scanning skill
(`ck-dead-code`) that:
1. **Scanned** all 5,279 source files across 639 directories for `#if 0`
blocks (found 187)
2. **Triaged** each block with an LLM expert that reads the code in
context and judges REMOVE vs KEEP based on whether the block is
genuinely obsolete or intentionally disabled (debug helpers, alternative
configs, compiler workarounds, planned features)
3. **Kept 126 blocks** (67%) that serve a legitimate purpose —
quick-toggle configs, debug prints, compiler workarounds, planned
features with TODOs
4. **Removed 61 blocks** (33%) that are genuinely dead — obsolete
implementations replaced by better approaches, stale template signatures
that would no longer compile, buggy code with operator precedence
errors, unreachable code after return statements

### What was removed

| Category | Count | Example |
|----------|-------|---------|
| Obsolete implementations replaced by better approach | 35 | Scalar
transpose replaced by `__builtin_amdgcn_perm` intrinsic |
| Stale template signatures (would not compile) | 12 | Old `trait_`
missing quantization type parameter |
| Dead per-specialization padding branches | 10 | Replaced by
unconditional pad-both approach |
| Buggy dead code | 2 | Operator precedence errors in `#if 0` branches |
| Unreachable code | 1 | Code after unconditional return |
| Abandoned stubs | 1 | Empty template alias with no body |

### What was intentionally kept (not in this PR)

126 `#if 0` blocks were triaged as KEEP:
- **42** alternative configurations (tile sizes, data types, pipeline
toggles)
- **22** debug/diagnostic helpers (`printf`, `std::cout`, `show_*`)
- **20** planned features with TODO comments
- **18** compiler workarounds (rocm version-specific, constexpr issues)
- **12** partially completed features with FIXME
- **10** reference implementations kept for validation
- **2** hardware-specific workarounds
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants