CK: Remove 61 dead #if 0 code blocks (~2,600 lines)#6300
Merged
AviralGoelAMD merged 25 commits intoApr 13, 2026
Conversation
The disabled loop applying b_element_op was superseded by the active code path using ReferenceGemm with PassThrough ops.
…instances Remove obsolete #if 0 disabled code blocks identified by automated dead code scan with expert triage.
Remove obsolete #if 0 disabled code blocks identified by automated dead code scan with expert triage.
Remove obsolete #if 0 disabled code blocks identified by automated dead code scan with expert triage.
Remove obsolete #if 0 disabled code blocks identified by automated dead code scan with expert triage.
Remove obsolete #if 0 disabled code blocks identified by automated dead code scan with expert triage.
Remove obsolete #if 0 disabled code blocks identified by automated dead code scan with expert triage.
Remove obsolete #if 0 disabled code blocks identified by automated dead code scan with expert triage.
Remove obsolete #if 0 disabled code blocks identified by automated dead code scan with expert triage.
Remove obsolete #if 0 disabled code blocks identified by automated dead code scan with expert triage.
Remove obsolete #if 0 disabled code blocks identified by automated dead code scan with expert triage.
Remove obsolete #if 0 disabled code blocks identified by automated dead code scan with expert triage.
Remove obsolete #if 0 disabled code blocks identified by automated dead code scan with expert triage.
Remove obsolete #if 0 disabled code blocks identified by automated dead code scan with expert triage.
Remove obsolete #if 0 disabled code blocks identified by automated dead code scan with expert triage.
Remove obsolete #if 0 disabled code blocks identified by automated dead code scan with expert triage.
Remove obsolete #if 0 disabled code blocks identified by automated dead code scan with expert triage.
Remove obsolete #if 0 disabled code blocks identified by automated dead code scan with expert triage.
…m_b_scale/device_gemm_b_scale_xdl_f16_i4_f16 Remove obsolete #if 0 disabled code blocks identified by automated dead code scan with expert triage.
…m_universal/device_gemm_xdl_universal_bf16_i4_bf16 Remove obsolete #if 0 disabled code blocks identified by automated dead code scan with expert triage.
…m_universal/device_gemm_xdl_universal_f16_i4_f16 Remove obsolete #if 0 disabled code blocks identified by automated dead code scan with expert triage.
Remove obsolete #if 0 disabled code blocks identified by automated dead code scan with expert triage.
Remove obsolete #if 0 disabled code blocks identified by automated dead code scan with expert triage.
Remove obsolete #if 0 disabled code blocks identified by automated dead code scan with expert triage.
brockhargreaves-amd
approved these changes
Apr 9, 2026
AviralGoelAMD
added a commit
that referenced
this pull request
Apr 10, 2026
Depends on #6300 ## Summary Remove 41 commented-out code blocks across 33 files in Composable Kernel, totaling ~200 lines. Identified using an automated dead code scanning skill (`ck-dead-code`) with a calibrated two-stage pipeline: 1. **Pre-filter**: Keyword-based scan found 1,338 `//`-commented blocks. Calibrated heuristics (trained on 50-sample expert classification) reduced to 89 high-confidence candidates — 93% noise reduction. 2. **Expert triage**: LLM expert classified each block in context as CODE_REMOVE, CODE_KEEP, or NOT_CODE. | Classification | Count | |---------------|-------| | Removed (this PR) | 41 | | Kept (debug helpers, alt configs, reference impls) | 32 | | Not code (false positives) | 16 | Removed blocks include: superseded implementations, old test data, abandoned stubs, unreachable code, and buggy dead code.
AviralGoelAMD
added a commit
that referenced
this pull request
Apr 10, 2026
…N_INSTANCE macro (#6325) Depends on #6324 ## Summary Refactor all 144 contraction library instance `.cpp` files (bilinear + scale, 2D/6D, f32/f64/bf16/cf32/cf64) to use a shared `CK_CONTRACTION_INSTANCE` macro defined in `contraction_instance_common.hpp`. Each 58-line file is reduced to 12 lines — zero unique logic, pure parameterization. | Metric | Value | |--------|-------| | Files changed | 145 (144 instances + 1 new shared header) | | Insertions | +1,373 | | Deletions | −7,890 | | **Net lines removed** | **−6,517** | ### What changed | Before | After | |--------|-------| | 144 instance files, each ~58 lines of identical boilerplate (includes, namespace, type alias, registration function) differing only in 12 template parameters | 1 shared macro header + 144 files at ~12 lines each (copyright + include + macro invocation) | ### Macro parameters (12) ``` CK_CONTRACTION_INSTANCE(INST_TPL, OP_NAME, CDE_OP, NDIM_VAL, NAME_SUFFIX, ADATA, BDATA, ACC, CSHUFFLE, DS_TUPLE, EDATA, COMPUTE) ``` | Parameter | Example | Purpose | |-----------|---------|---------| | `INST_TPL` | `device_contraction_kk_instance` | Device template to instantiate | | `OP_NAME` | `bilinear` | Lowercase, used in `##` token pasting for function/type names | | `CDE_OP` | `Bilinear` | C++ type name (capitalized) for template args | | `NDIM_VAL` | `2` or `6` | Number of dimensions | | `NAME_SUFFIX` | `f32_f32_f32_f32_kknn` | Data type + layout suffix | | `ADATA..COMPUTE` | `F32, F32, ...` | Template type arguments | ### Readability assessment A code realist review confirmed this change **improves readability**: the original 58-line files contained zero unique logic — just mechanical boilerplate wrappers that varied only in 12 template parameters. After the macro, each file's intent is immediately clear from a single macro call, and the 12 parameters serve as a concise specification of what the instance does (data types, layout, operation). Adding a new contraction instance requires writing 1 line instead of copying and modifying 58 lines. The realist also noted that the area has very low activity (1 functional commit in 18 months), so merge conflict risk is negligible. ### Cumulative cleanup series stats | PR | Description | Net lines | |----|-------------|-----------| | #6300 | Remove 61 dead `#if 0` blocks | −2,648 | | #6302 | Remove 41 commented-out dead code blocks | −2,861 | | #6303 | Remove 4 orphaned files | −3,886 | | #6323 | Extract gemm_quant test boilerplate | −693 | | #6324 | Extract contraction example boilerplate | −1,016 | | This PR | Refactor 144 contraction instance files | −6,517 | | **Total** | | **−17,621** |
AviralGoelAMD
added a commit
that referenced
this pull request
Apr 11, 2026
…ro (#6324) Depends on #6323 ## Summary Extract repeated `DeviceOpInstance` type alias boilerplate from 20 contraction example files into a single macro `CK_CONTRACTION_DEVICE_OP_INSTANCES(BASE, SUFFIX)` in `common_instances.hpp`. Each example file's 4 device-op-instance blocks (~56 lines) are replaced by 4 one-line macro calls. | Metric | Value | |--------|-------| | Files changed | 22 | | Insertions | +124 | | Deletions | −1,140 | | **Net lines removed** | **−1,016** | ### What changed | Before | After | |--------|-------| | 20 example files, each with 4 identical `DeviceOpInstance` type alias blocks (~14 lines each) | 1 macro definition in `common_instances.hpp` + 20 files with 4 one-line macro calls each | ### Readability assessment A code realist review confirmed this change **improves readability**: the 14-line `DeviceOpInstance` blocks were pure noise — identical across all 20 files and obscuring the actual example logic (data types, element operations). After the macro, each file's intent is immediately clear from the 4 one-liner macro calls, and a developer adding a new contraction example only needs to specify the varying parameters instead of copying 56 lines of boilerplate. ### GPU verification All 20 contraction examples verified on MI300X: **20/20 passed**. ### Cumulative cleanup series stats | PR | Description | Net lines | |----|-------------|-----------| | #6300 | Remove 61 dead `#if 0` blocks | −2,648 | | #6302 | Remove 41 commented-out dead code blocks | −2,861 | | #6303 | Remove 4 orphaned files | −3,886 | | #6323 | Extract gemm_quant test boilerplate | −693 | | This PR | Extract contraction example boilerplate | −1,016 | | **Total** | | **−11,104** |
AviralGoelAMD
added a commit
that referenced
this pull request
Apr 11, 2026
Depends on #6303 ## Summary Extract shared test boilerplate (includes, type aliases, test fixture macros) from 47 `test_gemm_quant_*` files into a single `test_gemm_quant_common.hpp` header. Each test file is reduced from ~50 lines of boilerplate to ~5 lines. | Metric | Value | |--------|-------| | Files changed | 48 | | Insertions | +413 | | Deletions | −1,106 | | **Net lines removed** | **−693** | ### What changed | Before | After | |--------|-------| | 47 test files, each with ~50 lines of identical includes, type aliases, and fixture macros | 1 shared header (`test_gemm_quant_common.hpp`) + 47 thin files (~5 lines each: include + params) | ### Readability assessment A code realist review confirmed this change **improves readability**: the 47 test files had identical boilerplate obscuring the only meaningful content — the `GemmConfig` type alias and test dimensions. After the refactoring, each file's unique configuration is immediately visible, and adding a new test variant requires specifying only the varying parameters instead of copying 50 lines. ### Cumulative cleanup series stats | PR | Description | Net lines | |----|-------------|-----------| | #6300 | Remove 61 dead `#if 0` blocks | −2,648 | | #6302 | Remove 41 commented-out dead code blocks | −2,861 | | #6303 | Remove 4 orphaned files | −3,886 | | This PR | Extract gemm_quant test boilerplate | −693 | | **Total** | | **−10,088** |
aledudek
pushed a commit
that referenced
this pull request
May 20, 2026
## Summary Remove 61 confirmed-dead `#if 0` code blocks across 52 files in Composable Kernel, totaling ~2,600 lines of dead code. These blocks were identified using an automated dead code scanning skill (`ck-dead-code`) that: 1. **Scanned** all 5,279 source files across 639 directories for `#if 0` blocks (found 187) 2. **Triaged** each block with an LLM expert that reads the code in context and judges REMOVE vs KEEP based on whether the block is genuinely obsolete or intentionally disabled (debug helpers, alternative configs, compiler workarounds, planned features) 3. **Kept 126 blocks** (67%) that serve a legitimate purpose — quick-toggle configs, debug prints, compiler workarounds, planned features with TODOs 4. **Removed 61 blocks** (33%) that are genuinely dead — obsolete implementations replaced by better approaches, stale template signatures that would no longer compile, buggy code with operator precedence errors, unreachable code after return statements ### What was removed | Category | Count | Example | |----------|-------|---------| | Obsolete implementations replaced by better approach | 35 | Scalar transpose replaced by `__builtin_amdgcn_perm` intrinsic | | Stale template signatures (would not compile) | 12 | Old `trait_` missing quantization type parameter | | Dead per-specialization padding branches | 10 | Replaced by unconditional pad-both approach | | Buggy dead code | 2 | Operator precedence errors in `#if 0` branches | | Unreachable code | 1 | Code after unconditional return | | Abandoned stubs | 1 | Empty template alias with no body | ### What was intentionally kept (not in this PR) 126 `#if 0` blocks were triaged as KEEP: - **42** alternative configurations (tile sizes, data types, pipeline toggles) - **22** debug/diagnostic helpers (`printf`, `std::cout`, `show_*`) - **20** planned features with TODO comments - **18** compiler workarounds (rocm version-specific, constexpr issues) - **12** partially completed features with FIXME - **10** reference implementations kept for validation - **2** hardware-specific workarounds
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Remove 61 confirmed-dead
#if 0code blocks across 52 files in Composable Kernel, totaling ~2,600 lines of dead code.These blocks were identified using an automated dead code scanning skill (
ck-dead-code) that:#if 0blocks (found 187)What was removed
__builtin_amdgcn_permintrinsictrait_missing quantization type parameter#if 0branchesWhat was intentionally kept (not in this PR)
126
#if 0blocks were triaged as KEEP:printf,std::cout,show_*)