-
Notifications
You must be signed in to change notification settings - Fork 326
[hipBLASLt] [TensileLite] Add tail loop support in subtile path for BF16 for all k sizes #7661
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
bnemanich
wants to merge
63
commits into
users/bnemanich/subtile-tailloop-k32-rebased
from
users/bnemanich/subtile-bf16-anyk
Closed
Changes from 5 commits
Commits
Show all changes
63 commits
Select commit
Hold shift + click to select a range
3b488dd
subtile: allow low ASEM for bf16 any-K tail loop
bnemanich 357d9ae
subtile: add bf16 any-K unit + yaml regression tests
bnemanich fd3846b
subtile: add BufferLoadB16/BufferLoadU16 rocisa bindings
bnemanich 48486b4
subtile: add largemt anyK regression yaml + unit pins per sebvince's …
bnemanich 4fd44e9
subtile: replay bf16-anyk tail-mask evolution onto K32-on-develop
bnemanich 9004947
docs(rocsolver): update install pages for 7.13 (#7364)
peterjunpark a889cf3
[stinkytofu] Prepare for TheRock subproject integration (#7556)
KKyang 91ee252
docs(rocrand): update install pages for 7.13 (#7366)
peterjunpark 9d7c386
[CI] [Documentation] Fix docs synchronization pipeline introduced by …
alexxu-amd 2cec672
[dnn-providers] Run test_name_validator.py as a ctest for providers (…
adickin-amd 5ff4d1a
[rocFFT] Batched distributed transform in MPI sample
af-ayala c158fde
rocfft: Replace rocm-smi with amd-smi in perf scripts (#7696)
Abuudiii 268d30e
Add FP32-to-FP8 conversion with stochastic rounding
StaceyLai 900b47b
Support FP32 to FP8 stochastic rounding conversion without v_prng_b32
StaceyLai 6824cec
Fix CI: add StochasticRounding to test_storeD_roundtrip ProblemType
StaceyLai 739b4a0
subtile: scrub reviewer-name references from bf16-anyk delta
bnemanich c3895f4
subtile: trim narrative comments in tail scaffold
bnemanich 02827ba
subtile: replace long MIT header with short form on new test files
bnemanich 857751b
subtile: move tail SRD tighteners to Components/Subtile/SubtileTailSr…
bnemanich 1429817
subtile: move tail-mask helpers to Components/Subtile/SubtileTailMask
bnemanich bd205dc
subtile: move tail-loop scaffold to Components/Subtile/SubtileTailSca…
bnemanich 7ae6c53
subtile: drop reviewer-name trailer from extracted scaffold
bnemanich 2531c99
subtile: fold test_solution_subtile_anyk_largemt into test_solution_s…
bnemanich 7da12b2
Adapt tolerances for spsm / sptrsm on HawkPoint for ill-conditioned m…
amontoison f4471a8
Fix yaml type mismatch in library logic for gfx1152/gfx1153/gfx1200 (…
Alex-Vasile 60e5be1
Fix yaml type mismatch in library logic for aquavanjaram (a) (3/13 of…
Alex-Vasile 2828780
[stinkytofu] Add comgr support for runtime toolchain capability probi…
KKyang 402dbad
[CK_TILE] Use Persistent Scheduling for FMHA BWD Group Deterministic …
DDEle b9db673
Update instruction: running clang-tidy (#7701)
KKyang a0e9f50
[hipDNN] ALMIOPEN-1869 Add optional hipdnn-frontend Python bindings t…
tvy-amd 32ccae3
Enable HalfPLR for MXF8 in gfx1250 (#7453)
boringmorning 9296d81
[tensilelite][stinkytofu] Fix PGR1 token bug (#7730)
hcman2 18ee0d8
[hipsparselt] Refactor LRVWMetadata (#7487)
leowu2017 54aed1e
[CK] Add rocm_ck spec factories: GemmSpec, makeSpec() (#7180)
shumway 0fab8d8
[CK TILE] Unification Work – Add MFMA specialisations for `fp64_t` (#…
yungshengtu c659ffd
subtile: reject single-wave WG=(1,1) + large WT + K-tail at Solution …
bnemanich 5177400
[stinkytofu] Make // and ; comment-stripping block-comment aware
darrenhsieh-amd ef165a5
[stinkytofu] Add RaiseVgprMsbPass with Insert byte-encoding fix (#7727)
darrenhsieh-amd 45583bd
[CK_TILE][FMHA] Improve precision of mxfp4 FMHA with fp6 for matrix P…
ex-rzr c939faf
[hipDNN] ALMIOPEN-1869 Enable clang-tidy for Python bindings in CI (#…
tvy-amd e4d5f04
[tensilelite] Add testpaths and norecursedirs to pytest.ini (#7571)
talumbau 7e27be9
Add MIOpen integration test for batchnorm unhappy activation (#7404)
Aleksandar301 d23f097
[ci] bump TheRock hash to `974db70` (2026-05-18) (#7582)
kailash-khalasi 45eb1a1
docs(hipfft): update install pages for 7.13 (#7375)
peterjunpark ddda8ac
[CK_TILE] Add save_matrix_txt() and extract HostTensor I/O to free fu…
AviralGoelAMD 8bc1843
Fix ZeroDivisionError and silent failure when TransposeLDS=0 is incom…
aadeshamd ef8a4cf
[ALMIOPEN-1951] [miopen] Fix install RPATH for MIOpenDriver and CK ba…
SreecharanGundaboluAMD ffbde87
[MIOpen] Add JSON performance logs for MIOpen convolution driver comm…
jdcampbe 867bece
[CK_TILE] Adding steps in Stream-K Tile Engine (#6511)
arai713 5b3f4b7
[CK_TILE] Stream-K XCD remapping (#4279)
assistant-librarian[bot] 52f486a
[rocblas] Fix install.sh/rmake.py when CMAKE_GENERATOR=Ninja is set i…
evedovelli e916514
Add missing dependency package to Dockerfiles
evedovelli 258c1fb
Bump urllib3 from 2.6.3 to 2.7.0 in /shared/tensile/docs/sphinx
dependabot[bot] 7c0d7aa
[Hipblaslt] [Subtiling] Add non-uniform partition size to Logical Sch…
sebvince 07c4e5e
consistently weaken new k==0 test so we don't verify that alpha is ig…
TorreZuk d3f057b
[MIOpen] Add initial MIOpen support for gfx1250 (#7587)
SreecharanGundaboluAMD 99c8e5b
Bump gitpython from 3.1.49 to 3.1.50 in /shared/tensile/docs/sphinx
dependabot[bot] f96e909
[tensilelite] Fix test_PlaceholderMerge xfail for TheRock CI (#7716)
archana-ramalingam 62d3a26
[hip-kernel-provider] Remove hip includes from RTC kernels (#7563)
EwanC e85adb5
Merge branch 'develop' into users/bnemanich/subtile-bf16-anyk
bnemanich f9dd411
subtile: add PGR=1 to largemt anyk yaml to clear post-develop-merge V…
bnemanich b88565f
subtile: make SubtileTailSrdTighten swizzle-size factor explicit
bnemanich 784ea30
subtile: trim verbose comments in tail SRD tighten swizzle refactor
bnemanich File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code seems redundant with the existing vgprTiles allocation in the logicalScheduler (allocVgprTiles). Not sure we want to maintain 2 allocations in 2 different places.