Skip to content

Adding TheRock CI#74

Merged
geomin12 merged 27 commits into
developfrom
users/geomin12/therock-ci
May 27, 2025
Merged

Adding TheRock CI#74
geomin12 merged 27 commits into
developfrom
users/geomin12/therock-ci

Conversation

@geomin12
Copy link
Copy Markdown
Collaborator

Changes:

  • Adding TheRock building for PRIM and then adding tests for rocPRIM from TheRock based on if files changed in projects/rocprim

(keep in note, this is just an initial push to get rocPRIM working! will iterate and make it better)

@geomin12 geomin12 requested review from amd-chrissosa and marbre May 22, 2025 18:34
@amd-chrissosa
Copy link
Copy Markdown
Contributor

Poke me when this is ready - overall reviewed when I was patching it and nothing looked too crazy. Just the comment around buckets in the other PR is the major one.

Comment thread .github/workflows/therock-ci.yml
Comment thread .github/workflows/therock-ci.yml Outdated
Comment thread .github/workflows/therock-ci.yml
@geomin12 geomin12 requested a review from marbre May 22, 2025 21:32
Comment thread .github/workflows/therock-ci.yml
@amd-chrissosa
Copy link
Copy Markdown
Contributor

I'm good with this once you are ready to remove the geo parts/fix branches. Just poke me for an approval

@geomin12
Copy link
Copy Markdown
Collaborator Author

geomin12 commented May 23, 2025

Will merge this through when TheRock 695 goes through, so I can do the final cleanup!

@geomin12 geomin12 marked this pull request as ready for review May 23, 2025 02:47
Comment thread .github/repos-config.json Outdated
Comment thread .github/workflows/therock-ci.yml
Comment thread .github/workflows/therock-ci.yml Outdated
Copy link
Copy Markdown
Collaborator

@jayhawk-commits jayhawk-commits left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Outside of the build flags which I don't know the specifics too well, the logic and flow of this action looks good to me.

Comment thread .github/workflows/therock-ci.yml Outdated
@geomin12
Copy link
Copy Markdown
Collaborator Author

Build previously worked in CI run

@geomin12
Copy link
Copy Markdown
Collaborator Author

sorry folks, running a test and reverted the change!

@geomin12 geomin12 requested a review from jayhawk-commits May 27, 2025 16:33
@geomin12 geomin12 merged commit e93ce14 into develop May 27, 2025
8 checks passed
@geomin12 geomin12 deleted the users/geomin12/therock-ci branch May 27, 2025 22:08
ammallya pushed a commit that referenced this pull request Sep 24, 2025
* Move callback implementation to private library

* Remove namespaces

* Remove include
ammallya pushed a commit that referenced this pull request Sep 24, 2025
* Move callback implementation to private library

* Remove namespaces

* Remove include

[ROCm/hipDNN commit: fbfc8e1]
evetsso pushed a commit to evetsso/rocm-libraries that referenced this pull request Dec 31, 2025
* Reapply "[CK_TILE][GFX1250] WMMA GEMM Support on GFX1250 - Part I"

This reverts commit 3af8c7a.

* [CK_TILE] Fix gemm test build error for gfx1250

Signed-off-by: TianyuanWu <Tianyuan.Wu@amd.com>

* Verify tests and examples for batched/grouped Gemm

---------

Signed-off-by: TianyuanWu <Tianyuan.Wu@amd.com>
Jeff-Huang added a commit that referenced this pull request Apr 23, 2026
Address all 5 reviewer asks on the >2GB KV cache batch-prefill series, plus
two self-found polish items surfaced by an internal CK-aware review pass.

Task #71 — bool kUseGlobalLoad_ -> BlockAttentionKVCacheLoadModeEnum kKVLoadMode_
(poyenc, tile_fmha_traits.hpp:62):
  Adjacent traits-template params (kKVMemoryLayout_, kKVLookupTable_) are
  already BlockAttention*Enum types; the binary kUseGlobalLoad_ stuck out
  as a bool exception. Convert to a 2-value enum {BUFFER_LOAD = 0,
  GLOBAL_LOAD_LDS = 1} living in a new ops/fmha/block/ header so it sits
  alongside its siblings. Touch sites:
  * include/ck_tile/ops/fmha/block/block_attention_kv_load_mode_enum.hpp
    (NEW): the enum class.
  * include/ck_tile/ops/fmha/pipeline/tile_fmha_traits.hpp: rename last
    template param + static member alias.
  * include/ck_tile/ops/fmha/pipeline/block_fmha_pipeline_problem.hpp:
    mirror alias rename.
  * include/ck_tile/ops/fmha/pipeline/block_fmha_batch_prefill_pipeline_qr_ks_vs_async.hpp:
    add enum header include; class declares static auto kKVLoadMode plus
    derived static bool kUseGlobalLoad = (kKVLoadMode == GLOBAL_LOAD_LDS).
    All 10 internal `if constexpr(kUseGlobalLoad)` sites unchanged so the
    bool boundary is local to one TU. The standalone helper
    kv_offset_array_transform keeps its bool template param (private
    inline; intentional — keeps core/ tile_scatter_gather.hpp out of the
    enum's blast radius).
  * example/ck_tile/01_fmha/fmha_fwd.hpp: fmha_fwd_batch_prefill_traits_
    last template param renamed; static member alias kUseGlobalLoad ->
    kKVLoadMode (default BUFFER_LOAD).
  * include/ck_tile/core/arch/amd_buffer_addressing_builtins.hpp:
    comment-only update.

Task #70 — explicit constructor mem-init for tile_scatter_gather
(asleepzzz, tile_scatter_gather.hpp:1241, comment #3125912056):
  physical_pages_ and page_stride_elements_ were silently zero-initialized
  in the BUFFER_LOAD arm. Today safe (Task #71's positive setter asserts
  prevent misuse), but a future kUseGlobalLoad=true caller that misses a
  setter would get silent data corruption with no compile error. Make
  both fields explicit in the mem-init list so the contract is visible
  at the constructor boundary.

Task #72 — extract dispatcher overflow predicate to a named helper
(poyenc, fmha_batch_prefill.py:225):
  Move the (page_block_size < kN0 && kv_pool_bytes > INT32_MAX) decision
  out of the codegen template into a free helper:
    fmha_batch_prefill_select_kv_load_mode(page_block_size, kN0,
                                           num_total_pages, batch_stride_k,
                                           element_bytes)
  in example/ck_tile/01_fmha/fmha_fwd.hpp. The codegen-emitted dispatcher
  arms now call it with their compile-time kN0/element_bytes substituted,
  so the formula has exactly one source of truth.

Task #73 — symmetric gload/bload kernel-name suffix
(poyenc, fmha_batch_prefill.py:282):
  Match the existing CK convention (e.g., causal/ncausal, sink/nsink) by
  emitting a non-empty token in BOTH branches: '-gload' / '-bload' on
  FmhaFwdApiTrait.name, 'gload_' / 'bload_' on FmhaFwdKernel.name. The
  prior blank-default made it impossible to tell, when grepping JIT
  blob/ output 6 months later, whether a missing marker meant
  'BUFFER_LOAD variant' or 'old codegen revision before the gload branch
  existed'.

Task #74 — replace single-use dependent-false with reusable always_false_v
(poyenc, amd_buffer_addressing_builtins.hpp:1324):
  Promote impl::global_load_lds_arch_unreachable_v from a file-local
  helper into a generic ck_tile::always_false_v utility in
  core/utility/type_traits.hpp. Use it at the original site. The
  variable-template form defers evaluation to instantiation time, so a
  bare `static_assert(false, ...)` would (per CWG-2518 / current Clang)
  fire at parse time and break the whole TU even on never-instantiated
  arches.

Polish I-1 — umbrella header completeness:
  include/ck_tile/ops/fmha.hpp now pulls in the new
  block_attention_kv_load_mode_enum.hpp alongside the other
  BlockAttention*Enum siblings. Without this, downstream consumers that
  rely solely on the umbrella header would miss the enum.

Polish I-2 — overflow-cast robustness in fmha_batch_prefill_select_kv_load_mode:
  Promote every operand of the kv_pool_bytes multiplication to
  long_index_t individually instead of relying on left-to-right
  associativity to widen the chain. A future operand reorder would
  silently truncate; the per-operand cast makes overflow impossible
  regardless of order.

Verified on smci355-gfx950 (gfx950): clean JIT rebuild succeeds; full
op_tests/test_batch_prefill.py sweep passes 30,720 / 30,720 (10,016
skipped, 0 failed) in 30:40 wall. Codegen identifier changes only affect
the renamed template parameter; no register-allocation perturbation
expected on either gfx942 or gfx950 (confirmed by the cross-arch sweep).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants