Adding TheRock CI#74
Merged
Merged
Conversation
Contributor
|
Poke me when this is ready - overall reviewed when I was patching it and nothing looked too crazy. Just the comment around buckets in the other PR is the major one. |
marbre
reviewed
May 22, 2025
marbre
reviewed
May 22, 2025
Contributor
|
I'm good with this once you are ready to remove the geo parts/fix branches. Just poke me for an approval |
Collaborator
Author
|
Will merge this through when TheRock 695 goes through, so I can do the final cleanup! |
geomin12
commented
May 23, 2025
Collaborator
jayhawk-commits
left a comment
There was a problem hiding this comment.
Outside of the build flags which I don't know the specifics too well, the logic and flow of this action looks good to me.
Collaborator
Author
|
Build previously worked in CI run |
Collaborator
Author
|
sorry folks, running a test and reverted the change! |
jayhawk-commits
approved these changes
May 27, 2025
ammallya
pushed a commit
that referenced
this pull request
Sep 24, 2025
* Move callback implementation to private library * Remove namespaces * Remove include
ammallya
pushed a commit
that referenced
this pull request
Sep 24, 2025
* Move callback implementation to private library * Remove namespaces * Remove include [ROCm/hipDNN commit: fbfc8e1]
evetsso
pushed a commit
to evetsso/rocm-libraries
that referenced
this pull request
Dec 31, 2025
* Reapply "[CK_TILE][GFX1250] WMMA GEMM Support on GFX1250 - Part I" This reverts commit 3af8c7a. * [CK_TILE] Fix gemm test build error for gfx1250 Signed-off-by: TianyuanWu <Tianyuan.Wu@amd.com> * Verify tests and examples for batched/grouped Gemm --------- Signed-off-by: TianyuanWu <Tianyuan.Wu@amd.com>
Jeff-Huang
added a commit
that referenced
this pull request
Apr 23, 2026
Address all 5 reviewer asks on the >2GB KV cache batch-prefill series, plus two self-found polish items surfaced by an internal CK-aware review pass. Task #71 — bool kUseGlobalLoad_ -> BlockAttentionKVCacheLoadModeEnum kKVLoadMode_ (poyenc, tile_fmha_traits.hpp:62): Adjacent traits-template params (kKVMemoryLayout_, kKVLookupTable_) are already BlockAttention*Enum types; the binary kUseGlobalLoad_ stuck out as a bool exception. Convert to a 2-value enum {BUFFER_LOAD = 0, GLOBAL_LOAD_LDS = 1} living in a new ops/fmha/block/ header so it sits alongside its siblings. Touch sites: * include/ck_tile/ops/fmha/block/block_attention_kv_load_mode_enum.hpp (NEW): the enum class. * include/ck_tile/ops/fmha/pipeline/tile_fmha_traits.hpp: rename last template param + static member alias. * include/ck_tile/ops/fmha/pipeline/block_fmha_pipeline_problem.hpp: mirror alias rename. * include/ck_tile/ops/fmha/pipeline/block_fmha_batch_prefill_pipeline_qr_ks_vs_async.hpp: add enum header include; class declares static auto kKVLoadMode plus derived static bool kUseGlobalLoad = (kKVLoadMode == GLOBAL_LOAD_LDS). All 10 internal `if constexpr(kUseGlobalLoad)` sites unchanged so the bool boundary is local to one TU. The standalone helper kv_offset_array_transform keeps its bool template param (private inline; intentional — keeps core/ tile_scatter_gather.hpp out of the enum's blast radius). * example/ck_tile/01_fmha/fmha_fwd.hpp: fmha_fwd_batch_prefill_traits_ last template param renamed; static member alias kUseGlobalLoad -> kKVLoadMode (default BUFFER_LOAD). * include/ck_tile/core/arch/amd_buffer_addressing_builtins.hpp: comment-only update. Task #70 — explicit constructor mem-init for tile_scatter_gather (asleepzzz, tile_scatter_gather.hpp:1241, comment #3125912056): physical_pages_ and page_stride_elements_ were silently zero-initialized in the BUFFER_LOAD arm. Today safe (Task #71's positive setter asserts prevent misuse), but a future kUseGlobalLoad=true caller that misses a setter would get silent data corruption with no compile error. Make both fields explicit in the mem-init list so the contract is visible at the constructor boundary. Task #72 — extract dispatcher overflow predicate to a named helper (poyenc, fmha_batch_prefill.py:225): Move the (page_block_size < kN0 && kv_pool_bytes > INT32_MAX) decision out of the codegen template into a free helper: fmha_batch_prefill_select_kv_load_mode(page_block_size, kN0, num_total_pages, batch_stride_k, element_bytes) in example/ck_tile/01_fmha/fmha_fwd.hpp. The codegen-emitted dispatcher arms now call it with their compile-time kN0/element_bytes substituted, so the formula has exactly one source of truth. Task #73 — symmetric gload/bload kernel-name suffix (poyenc, fmha_batch_prefill.py:282): Match the existing CK convention (e.g., causal/ncausal, sink/nsink) by emitting a non-empty token in BOTH branches: '-gload' / '-bload' on FmhaFwdApiTrait.name, 'gload_' / 'bload_' on FmhaFwdKernel.name. The prior blank-default made it impossible to tell, when grepping JIT blob/ output 6 months later, whether a missing marker meant 'BUFFER_LOAD variant' or 'old codegen revision before the gload branch existed'. Task #74 — replace single-use dependent-false with reusable always_false_v (poyenc, amd_buffer_addressing_builtins.hpp:1324): Promote impl::global_load_lds_arch_unreachable_v from a file-local helper into a generic ck_tile::always_false_v utility in core/utility/type_traits.hpp. Use it at the original site. The variable-template form defers evaluation to instantiation time, so a bare `static_assert(false, ...)` would (per CWG-2518 / current Clang) fire at parse time and break the whole TU even on never-instantiated arches. Polish I-1 — umbrella header completeness: include/ck_tile/ops/fmha.hpp now pulls in the new block_attention_kv_load_mode_enum.hpp alongside the other BlockAttention*Enum siblings. Without this, downstream consumers that rely solely on the umbrella header would miss the enum. Polish I-2 — overflow-cast robustness in fmha_batch_prefill_select_kv_load_mode: Promote every operand of the kv_pool_bytes multiplication to long_index_t individually instead of relying on left-to-right associativity to widen the chain. A future operand reorder would silently truncate; the per-operand cast makes overflow impossible regardless of order. Verified on smci355-gfx950 (gfx950): clean JIT rebuild succeeds; full op_tests/test_batch_prefill.py sweep passes 30,720 / 30,720 (10,016 skipped, 0 failed) in 30:40 wall. Codegen identifier changes only affect the renamed template parameter; no register-allocation perturbation expected on either gfx942 or gfx950 (confirmed by the cross-arch sweep).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Changes:
projects/rocprim(keep in note, this is just an initial push to get rocPRIM working! will iterate and make it better)