Skip to content

[rocsolver] new stedc code#3098

Open
jzuniga-amd wants to merge 4 commits into
ROCm:developfrom
jzuniga-amd:rocsolver-newstedc
Open

[rocsolver] new stedc code#3098
jzuniga-amd wants to merge 4 commits into
ROCm:developfrom
jzuniga-amd:rocsolver-newstedc

Conversation

@jzuniga-amd
Copy link
Copy Markdown
Contributor

Motivation

Add a better implementation of the divide and conquer algorithm. The main objective is to have a cleaner/simpler STEDC code.

Technical Details

This new implementation starts the merge process with the eigenvalues/vectors of the leaf nodes in the tree already sorted.

Test Plan

The normal unit tests, the extended stress tests, and the performance tests were executed

Test Result

The new code is simpler and a bit faster than the current code (up to 8ms for the larger sizes across all matrix initializations), without any relevant changes in accuracy.

@jzuniga-amd jzuniga-amd requested a review from a team as a code owner December 3, 2025 23:26
@jzuniga-amd jzuniga-amd added the noOptimizations Disable optimized kernels for small sizes for some routines label Dec 3, 2025
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Dec 3, 2025

Codecov Report

❌ Patch coverage is 17.24138% with 96 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
...csolver/library/src/include/lib_device_helpers.hpp 0.00% 94 Missing ⚠️
...olver/library/src/auxiliary/rocauxiliary_stedc.cpp 88.24% 2 Missing ⚠️

❗ There is a different number of reports uploaded between BASE (3f037de) and HEAD (ad1a278). Click for more details.

HEAD has 1 upload less than BASE
Flag BASE (3f037de) HEAD (ad1a278)
rocThrust 1 0
Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #3098      +/-   ##
===========================================
- Coverage    84.07%   75.16%   -8.91%     
===========================================
  Files          501      321     -180     
  Lines        42479    31280   -11199     
  Branches      5459     4717     -742     
===========================================
- Hits         35714    23510   -12204     
- Misses        2499     4929    +2430     
+ Partials      4266     2841    -1425     
Flag Coverage Δ
rocSOLVER 75.16% <17.24%> (?)
rocThrust ?

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...olver/library/src/auxiliary/rocauxiliary_stedc.hpp 65.01% <ø> (ø)
...olver/library/src/lapack/roclapack_syevd_heevd.cpp 87.88% <100.00%> (ø)
...olver/library/src/lapack/roclapack_syevd_heevd.hpp 78.69% <100.00%> (ø)
...brary/src/lapack/roclapack_syevd_heevd_batched.cpp 88.06% <100.00%> (ø)
...c/lapack/roclapack_syevd_heevd_strided_batched.cpp 88.06% <100.00%> (ø)
...olver/library/src/auxiliary/rocauxiliary_stedc.cpp 87.27% <88.24%> (ø)
...csolver/library/src/include/lib_device_helpers.hpp 8.85% <0.00%> (ø)

... and 815 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

COrruDXC pushed a commit to COrruDXC/rocm-libraries that referenced this pull request Dec 5, 2025
* block_m = 32

* ck block_m = 32

* aiter/3rdparty/composable_kernel/include/ck/tensor_operation/gpu/block/blockwise_gemm_pipeline_xdlops_b_preshuffle_mx_moe_v3.hpp format

* mxfp4_moe v1 pipe

* update format

---------

Co-authored-by: zhimding <zhimding@amd.com>
Co-authored-by: lalala-sh <Jiaxing.Wen@amd.com>
Co-authored-by: felix <felix.li@amd.com>
ammallya pushed a commit that referenced this pull request Feb 3, 2026
* block_m = 32

* ck block_m = 32

* aiter/3rdparty/composable_kernel/include/ck/tensor_operation/gpu/block/blockwise_gemm_pipeline_xdlops_b_preshuffle_mx_moe_v3.hpp format

* mxfp4_moe v1 pipe

* update format

---------

Co-authored-by: zhimding <zhimding@amd.com>
Co-authored-by: lalala-sh <Jiaxing.Wen@amd.com>
Co-authored-by: felix <felix.li@amd.com>

[ROCm/composable_kernel commit: d04eba4]
@EdDAzevedo EdDAzevedo requested a review from Copilot March 13, 2026 03:13
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Introduces a reworked STEDC (divide-and-conquer) implementation intended to simplify merge logic and improve performance, while reusing existing SYEV/HEEV argument validation.

Changes:

  • Switch SYEVD/HEEVD argument checking to reuse rocsolver_syev_heev_argCheck.
  • Replace large parts of STEDC merge/deflation/vector-update flow with a new implementation and new workspace layout.
  • Add a shared bisearch helper and adjust test initialization to optionally generate the EIG7 case via GPU operations for bc==1.

Reviewed changes

Copilot reviewed 7 out of 8 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
projects/rocsolver/library/src/lapack/roclapack_syevd_heevd_strided_batched.cpp Reuses SYEV/HEEV arg checking in strided-batched entrypoint.
projects/rocsolver/library/src/lapack/roclapack_syevd_heevd_batched.cpp Reuses SYEV/HEEV arg checking in batched entrypoint.
projects/rocsolver/library/src/lapack/roclapack_syevd_heevd.hpp Removes local argCheck, wires STEDC call-site workspace changes, includes SYEV/HEEV header.
projects/rocsolver/library/src/lapack/roclapack_syevd_heevd.cpp Reuses SYEV/HEEV arg checking in non-batched entrypoint.
projects/rocsolver/library/src/include/lib_device_helpers.hpp Adds bisearch helper for device/host binary search.
projects/rocsolver/library/src/auxiliary/rocauxiliary_stedc.hpp Major STEDC refactor: new merge/deflation kernels, new memory sizing and workspace usage.
projects/rocsolver/library/src/auxiliary/rocauxiliary_stedc.cpp Updates STEDC workspace allocation to new memory-size API and new buffers.
projects/rocsolver/clients/common/lapack/testing_syevd_heevd.hpp Adds GPU-based EIG7 init for bc==1 and toggles selection via macro.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Comment on lines +322 to +325
extern __shared__ rocblas_int shmem[];
rocblas_int* posi = shmem;
rocblas_int* posf = posi + (1 << (k + 1));
S* shmaxz = reinterpret_cast<S*>(posf + (1 << (k + 1)));
strideD, V, 0, ldv, strideV, workSvecs, workInt);

rocblas_int ngps = blks / (1 << (k + 1));
size_t lmemsize = sizeof(S) * blks + sizeof(rocblas_int) * 2 * (1 << (k + 1));
@@ -39,213 +39,27 @@
#include "rocsolver/rocsolver.h"

#include <algorithm>

ROCSOLVER_BEGIN_NAMESPACE

#define STEDC_BDIM 512 // Number of threads per thread-block used in main stedc kernels
#define STEDC_SOLVE_BDIM 4 // Number of threads per thread-block used in solver kernel
#define STEDC_BDIM_VALUES 4 // Number of therads per thread-block used in mergeValues kernel
- This kernel is to be called with as many groups in x as needed to cover all
the batch_count problems. Each thread will work with a matrix in the batch.
divides the input matrix into 'blks' sub-blocks.
- This kernel is to be called with as many sroups in x as needed to cover all
@@ -39,213 +39,27 @@
#include "rocsolver/rocsolver.h"

#include <algorithm>
If STRICT = false, it returns the number of elements in 'X' that are smaller than or
equal to 'val' **/
template <typename T>
__device__ __host__ rocblas_int bisearch(T val, T* X, rocblas_int n, bool STRICT, bool REVERSE)
Comment on lines +561 to +567
#define USE_GPU true
if((std::getenv("TEST_EIG7") != nullptr) || (std::getenv("SYEVD_TEST_EIG7") != nullptr))
{
syevd_heevd_eig7_initData<CPU, GPU>(handle, evect, n, dA, lda, bc, hA, A, test);
if(USE_GPU)
syevd_heevd_eig7_initData_gpu<CPU, GPU>(handle, evect, n, dA, lda, bc, hA, A, test);
else
syevd_heevd_eig7_initData<CPU, GPU>(handle, evect, n, dA, lda, bc, hA, A, test);
@github-actions
Copy link
Copy Markdown
Contributor

This pull request has been inactive for 25 days and will be marked as stale.

If you would like to keep this PR open, please:

  • Add new commits
  • Add a comment explaining why it should remain open

This PR will be automatically closed in 5 days if no further activity occurs.

@github-actions github-actions Bot added the Stale PR has no activity for 25+ days label Apr 30, 2026
@mgates3
Copy link
Copy Markdown
Contributor

mgates3 commented Apr 30, 2026

This PR is still under review and should stay open. It has been delayed by higher priority PRs.

@github-actions github-actions Bot removed the Stale PR has no activity for 25+ days label May 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

noOptimizations Disable optimized kernels for small sizes for some routines organization: ROCm project: rocsolver

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants