Skip to content

[rocBLAS] Users/torrezuk/swdev 568158 syrk ex tolerance fix#2851

Merged
TorreZuk merged 2 commits into
developfrom
users/torrezuk/swdev-568158-syrk-ex-tolerance-fix
Nov 24, 2025
Merged

[rocBLAS] Users/torrezuk/swdev 568158 syrk ex tolerance fix#2851
TorreZuk merged 2 commits into
developfrom
users/torrezuk/swdev-568158-syrk-ex-tolerance-fix

Conversation

@TorreZuk
Copy link
Copy Markdown
Contributor

  • fix syrk_ex tolerance due to reference conversions and add double reference function
  • add gfx11 tolerance template for f32 with f64 compute

@codecov-commenter
Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.

Additional details and impacted files
@@             Coverage Diff              @@
##           develop    #2851       +/-   ##
============================================
+ Coverage    54.54%   67.07%   +12.53%     
============================================
  Files           14      362      +348     
  Lines         3768    51073    +47305     
  Branches       578     5837     +5259     
============================================
+ Hits          2055    34255    +32200     
- Misses        1468    13157    +11689     
- Partials       245     3661     +3416     
Flag Coverage Δ
hipFFT ?
rocBLAS 67.07% <ø> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.
see 376 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@TorreZuk TorreZuk merged commit d471513 into develop Nov 24, 2025
39 of 40 checks passed
@TorreZuk TorreZuk deleted the users/torrezuk/swdev-568158-syrk-ex-tolerance-fix branch November 24, 2025 16:31
assistant-librarian Bot pushed a commit to ROCm/rocBLAS that referenced this pull request Nov 24, 2025
[rocBLAS] Users/torrezuk/swdev 568158 syrk ex tolerance fix
 (#2851)

* fix syrk_ex tolerance due to reference conversions and add double precision
reference function
* add gfx11 tolerance template using f32 for f64 compute
TorreZuk added a commit that referenced this pull request Nov 24, 2025
* fix syrk_ex tolerance due to reference conversions and add double precision
reference function
* add gfx11 tolerance template using f32 for f64 compute

(cherry picked from commit d471513)
TorreZuk added a commit that referenced this pull request Nov 24, 2025
* fix syrk_ex tolerance due to reference conversions and add double precision
reference function
* add gfx11 tolerance template using f32 for f64 compute

(cherry picked from commit d471513)
vamovsik pushed a commit that referenced this pull request Nov 28, 2025
…2872)

* fix syrk_ex tolerance due to reference conversions and add double
precision reference function
* add gfx11 tolerance template using f32 for f64 compute

(cherry picked from commit d471513)
tfalders pushed a commit to tfalders/rocm-libraries that referenced this pull request Jan 21, 2026
ROCm#2851)

* [CK_TILE] Add sequence padding and variable length support in fmha (and v3)

 - Group Mode Padding: Introduces the `-s_qpad` argument to support
   physically padded layouts. Kernels now use padded start pointers
   (`seqstart_padded_*_ptr`) for memory addressing.

 - Batch Mode Variable Length: Adds `-q_eff_lens` and `-kv_eff_lens`
   arguments for efficient processing of variable-length sequences by
   passing cumulative effective lengths (`cu_seqlen_*_ptr`) to the kernel.

 - FMHA examples: Support padding and variable length both in
   group and batch mode. Dispatcher is updated as well (dispatch to
   kPadSeqLenK enabled pipeline).

 - New padding test cases: Add padding test cases to `smoke_test_fwd.sh`,
   and add benchmarks to `benchmark_fwd.sh` and `benchmark_fwd_v3.sh` as well.
   These test cases and benchmarks that specifically validate/benchmark the
   new padding and variable-length functionalities in both group and batch modes.

* [CK_TILE] Fix build error in fmha unit tests

---------

Co-authored-by: Po Yen Chen <PoYen.Chen@amd.com>
Co-authored-by: Yi DING <yi.ding@amd.com>
ammallya pushed a commit that referenced this pull request Feb 3, 2026
#2851)

* [CK_TILE] Add sequence padding and variable length support in fmha (and v3)

 - Group Mode Padding: Introduces the `-s_qpad` argument to support
   physically padded layouts. Kernels now use padded start pointers
   (`seqstart_padded_*_ptr`) for memory addressing.

 - Batch Mode Variable Length: Adds `-q_eff_lens` and `-kv_eff_lens`
   arguments for efficient processing of variable-length sequences by
   passing cumulative effective lengths (`cu_seqlen_*_ptr`) to the kernel.

 - FMHA examples: Support padding and variable length both in
   group and batch mode. Dispatcher is updated as well (dispatch to
   kPadSeqLenK enabled pipeline).

 - New padding test cases: Add padding test cases to `smoke_test_fwd.sh`,
   and add benchmarks to `benchmark_fwd.sh` and `benchmark_fwd_v3.sh` as well.
   These test cases and benchmarks that specifically validate/benchmark the
   new padding and variable-length functionalities in both group and batch modes.

* [CK_TILE] Fix build error in fmha unit tests

---------

Co-authored-by: Po Yen Chen <PoYen.Chen@amd.com>
Co-authored-by: Yi DING <yi.ding@amd.com>

[ROCm/composable_kernel commit: 86dd59c]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants