Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RT-TDDFT GPU Acceleration: RT-TD now fully support GPU computation #5773

Merged
merged 45 commits into from
Jan 22, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
45 commits
Select commit Hold shift + click to select a range
eee8b75
Phase 1 of RT-TDDFT GPU Acceleration: Rewriting existing code using T…
AsTonyshment Dec 26, 2024
aa4ceb1
[pre-commit.ci lite] apply automatic fixes
pre-commit-ci-lite[bot] Dec 26, 2024
069c434
Merge branch 'deepmodeling:develop' into TDDFT_GPU_phase_1
AsTonyshment Dec 27, 2024
e45398a
Initialize int info in bandenergy.cpp
AsTonyshment Dec 27, 2024
a6040ec
Initialize double aa, bb in bandenergy.cpp
AsTonyshment Dec 27, 2024
0bebb32
Merge branch 'deepmodeling:develop' into TDDFT_GPU_phase_1
AsTonyshment Dec 30, 2024
ac4e737
Merge branch 'TDDFT_GPU_phase_1' of github.com:AsTonyshment/abacus-de…
AsTonyshment Dec 30, 2024
8ed6407
Merge branch 'deepmodeling:develop' into TDDFT_GPU_phase_1
AsTonyshment Dec 31, 2024
e67b42f
Merge branch 'TDDFT_GPU_phase_1' of github.com:AsTonyshment/abacus-de…
AsTonyshment Dec 31, 2024
9e4b889
Fix a bug where CopyFrom caused shared data between tensors, using =(…
AsTonyshment Dec 31, 2024
9ca053d
Merge branch 'deepmodeling:develop' into TDDFT_GPU_phase_1
AsTonyshment Jan 2, 2025
ba12e92
Merge branch 'deepmodeling:develop' into TDDFT_GPU_phase_1
AsTonyshment Jan 3, 2025
3110720
Merge branch 'deepmodeling:develop' into TDDFT_GPU_phase_1
AsTonyshment Jan 3, 2025
eda3add
RT-TDDFT GPU Acceleration (Phase 2): Adding needed BLAS and LAPACK su…
AsTonyshment Jan 3, 2025
4685fb8
Merge branch 'deepmodeling:develop' into TDDFT_GPU_phase_1
AsTonyshment Jan 6, 2025
717c164
Merge branch 'deepmodeling:develop' into TDDFT_GPU_phase_1
AsTonyshment Jan 6, 2025
e3c493d
Merge branch 'deepmodeling:develop' into TDDFT_GPU_phase_1
AsTonyshment Jan 10, 2025
d89f9a3
Merge branch 'deepmodeling:develop' into TDDFT_GPU_phase_1
AsTonyshment Jan 11, 2025
7f94b4d
LAPACK wrapper functions: change const basic-type input parameters fr…
AsTonyshment Jan 13, 2025
0e458b9
Did nothing, just formatting esolver.cpp
AsTonyshment Jan 13, 2025
824168d
Merge branch 'deepmodeling:develop' into TDDFT_GPU_phase_1
AsTonyshment Jan 14, 2025
b9f8ca4
Merge branch 'TDDFT_GPU_phase_1' of github.com:AsTonyshment/abacus-de…
AsTonyshment Jan 14, 2025
bdc6cf6
Merge branch 'deepmodeling:develop' into TDDFT_GPU_phase_1
AsTonyshment Jan 15, 2025
fbe01cd
Merge branch 'TDDFT_GPU_phase_1' of github.com:AsTonyshment/abacus-de…
AsTonyshment Jan 15, 2025
5044ac5
Merge branch 'develop' into TDDFT_GPU_phase_1
AsTonyshment Jan 17, 2025
d732808
Merge branch 'TDDFT_GPU_phase_1' of github.com:AsTonyshment/abacus-de…
AsTonyshment Jan 17, 2025
0e6c42c
Core algorithm: RT-TD now has preliminary support for GPU computation
AsTonyshment Jan 17, 2025
20fd170
Fix GitHub CI CUDA build bug due to deleted variable
AsTonyshment Jan 17, 2025
1d9e60f
Refactor some files
AsTonyshment Jan 18, 2025
c6559dd
Getting ready for gathering MPI processes
AsTonyshment Jan 18, 2025
698bec2
Merge branch 'deepmodeling:develop' into TDDFT_GPU_phase_1
AsTonyshment Jan 18, 2025
38ad956
Merge branch 'TDDFT_GPU_phase_1' of github.com:AsTonyshment/abacus-de…
AsTonyshment Jan 18, 2025
4f24415
MPI multi-process compatibility
AsTonyshment Jan 19, 2025
cca5fa9
Fix GitHub CI MPI compilation bug
AsTonyshment Jan 19, 2025
62df525
Minor fix and refactor
AsTonyshment Jan 20, 2025
8b526a9
Merge branch 'deepmodeling:develop' into TDDFT_GPU_phase_1
AsTonyshment Jan 20, 2025
fde9d05
Initialize double aa, bb and one line for one variable
AsTonyshment Jan 21, 2025
87893a9
Rename bandenergy.cpp to band_energy.cpp and corresponding adjustments
AsTonyshment Jan 21, 2025
a02a352
Fix compile error and change CMakeLists accordingly
AsTonyshment Jan 21, 2025
2bdc83f
Merge branch 'TDDFT_GPU_phase_1' of github.com:AsTonyshment/abacus-de…
AsTonyshment Jan 21, 2025
214bdb8
Initialize int naroc
AsTonyshment Jan 21, 2025
e4ab72a
Initialize MPI related variables: myid, num_procs and root_proc
AsTonyshment Jan 21, 2025
dc54ffd
Refactor Propagator class implementation into multiple files for bett…
AsTonyshment Jan 21, 2025
079f791
Remove all GlobalV::ofs_running from RT-TDDFT core algorithms and pas…
AsTonyshment Jan 21, 2025
c0ca245
Add assert in some places and optimize redundant index calculations i…
AsTonyshment Jan 21, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion source/Makefile.Objects
Original file line number Diff line number Diff line change
Expand Up @@ -557,10 +557,13 @@ OBJS_IO_LCAO=cal_r_overlap_R.o\

OBJS_LCAO=evolve_elec.o\
evolve_psi.o\
bandenergy.o\
band_energy.o\
middle_hamilt.o\
norm_psi.o\
propagator.o\
propagator_cn2.o\
propagator_taylor.o\
propagator_etrs.o\
td_velocity.o\
td_current.o\
snap_psibeta_half_tddft.o\
Expand Down
4 changes: 2 additions & 2 deletions source/module_base/lapack_connector.h
Original file line number Diff line number Diff line change
Expand Up @@ -133,8 +133,8 @@ extern "C"

// zgetrf computes the LU factorization of a general matrix
// while zgetri takes its output to perform matrix inversion
void zgetrf_(const int* m, const int *n, const std::complex<double> *A, const int *lda, int *ipiv, const int* info);
void zgetri_(const int* n, std::complex<double> *A, const int *lda, int *ipiv, std::complex<double> *work, int *lwork, const int *info);
void zgetrf_(const int* m, const int *n, std::complex<double> *A, const int *lda, int *ipiv, int* info);
void zgetri_(const int* n, std::complex<double>* A, const int* lda, const int* ipiv, std::complex<double>* work, const int* lwork, int* info);

// if trans=='N': C = alpha * A * A.H + beta * C
// if trans=='C': C = alpha * A.H * A + beta * C
Expand Down
58 changes: 58 additions & 0 deletions source/module_base/module_container/ATen/kernels/cuda/lapack.cu
Original file line number Diff line number Diff line change
Expand Up @@ -117,6 +117,49 @@ struct lapack_dngvd<T, DEVICE_GPU> {
}
};

template <typename T>
struct lapack_getrf<T, DEVICE_GPU> {
void operator()(
const int& m,
const int& n,
T* Mat,
const int& lda,
int* ipiv)
{
cuSolverConnector::getrf(cusolver_handle, m, n, Mat, lda, ipiv);
}
};

template <typename T>
struct lapack_getri<T, DEVICE_GPU> {
void operator()(
const int& n,
T* Mat,
const int& lda,
const int* ipiv,
T* work,
const int& lwork)
{
throw std::runtime_error("cuSOLVER does not provide LU-based matrix inversion interface (getri). To compute the inverse on GPU, use getrs instead.");
}
};

template <typename T>
struct lapack_getrs<T, DEVICE_GPU> {
void operator()(
const char& trans,
const int& n,
const int& nrhs,
T* A,
const int& lda,
const int* ipiv,
T* B,
const int& ldb)
{
cuSolverConnector::getrs(cusolver_handle, trans, n, nrhs, A, lda, ipiv, B, ldb);
}
};

template struct set_matrix<float, DEVICE_GPU>;
template struct set_matrix<double, DEVICE_GPU>;
template struct set_matrix<std::complex<float>, DEVICE_GPU>;
Expand All @@ -142,5 +185,20 @@ template struct lapack_dngvd<double, DEVICE_GPU>;
template struct lapack_dngvd<std::complex<float>, DEVICE_GPU>;
template struct lapack_dngvd<std::complex<double>, DEVICE_GPU>;

template struct lapack_getrf<float, DEVICE_GPU>;
template struct lapack_getrf<double, DEVICE_GPU>;
template struct lapack_getrf<std::complex<float>, DEVICE_GPU>;
template struct lapack_getrf<std::complex<double>, DEVICE_GPU>;

template struct lapack_getri<float, DEVICE_GPU>;
template struct lapack_getri<double, DEVICE_GPU>;
template struct lapack_getri<std::complex<float>, DEVICE_GPU>;
template struct lapack_getri<std::complex<double>, DEVICE_GPU>;

template struct lapack_getrs<float, DEVICE_GPU>;
template struct lapack_getrs<double, DEVICE_GPU>;
template struct lapack_getrs<std::complex<float>, DEVICE_GPU>;
template struct lapack_getrs<std::complex<double>, DEVICE_GPU>;

} // namespace kernels
} // namespace container
70 changes: 70 additions & 0 deletions source/module_base/module_container/ATen/kernels/lapack.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -124,6 +124,61 @@ struct lapack_dngvd<T, DEVICE_CPU> {
}
};

template <typename T>
struct lapack_getrf<T, DEVICE_CPU> {
void operator()(
const int& m,
const int& n,
T* Mat,
const int& lda,
int* ipiv)
{
int info = 0;
lapackConnector::getrf(m, n, Mat, lda, ipiv, info);
if (info != 0) {
throw std::runtime_error("getrf failed with info = " + std::to_string(info));
}
}
};

template <typename T>
struct lapack_getri<T, DEVICE_CPU> {
void operator()(
const int& n,
T* Mat,
const int& lda,
const int* ipiv,
T* work,
const int& lwork)
{
int info = 0;
lapackConnector::getri(n, Mat, lda, ipiv, work, lwork, info);
if (info != 0) {
throw std::runtime_error("getri failed with info = " + std::to_string(info));
}
}
};

template <typename T>
struct lapack_getrs<T, DEVICE_CPU> {
void operator()(
const char& trans,
const int& n,
const int& nrhs,
T* A,
const int& lda,
const int* ipiv,
T* B,
const int& ldb)
{
int info = 0;
lapackConnector::getrs(trans, n, nrhs, A, lda, ipiv, B, ldb, info);
if (info != 0) {
throw std::runtime_error("getrs failed with info = " + std::to_string(info));
}
}
};

template struct set_matrix<float, DEVICE_CPU>;
template struct set_matrix<double, DEVICE_CPU>;
template struct set_matrix<std::complex<float>, DEVICE_CPU>;
Expand All @@ -149,5 +204,20 @@ template struct lapack_dngvd<double, DEVICE_CPU>;
template struct lapack_dngvd<std::complex<float>, DEVICE_CPU>;
template struct lapack_dngvd<std::complex<double>, DEVICE_CPU>;

template struct lapack_getrf<float, DEVICE_CPU>;
template struct lapack_getrf<double, DEVICE_CPU>;
template struct lapack_getrf<std::complex<float>, DEVICE_CPU>;
template struct lapack_getrf<std::complex<double>, DEVICE_CPU>;

template struct lapack_getri<float, DEVICE_CPU>;
template struct lapack_getri<double, DEVICE_CPU>;
template struct lapack_getri<std::complex<float>, DEVICE_CPU>;
template struct lapack_getri<std::complex<double>, DEVICE_CPU>;

template struct lapack_getrs<float, DEVICE_CPU>;
template struct lapack_getrs<double, DEVICE_CPU>;
template struct lapack_getrs<std::complex<float>, DEVICE_CPU>;
template struct lapack_getrs<std::complex<double>, DEVICE_CPU>;

} // namespace kernels
} // namespace container
36 changes: 36 additions & 0 deletions source/module_base/module_container/ATen/kernels/lapack.h
Original file line number Diff line number Diff line change
Expand Up @@ -65,6 +65,42 @@ struct lapack_dngvd {
Real* eigen_val);
};


template <typename T, typename Device>
struct lapack_getrf {
void operator()(
const int& m,
const int& n,
T* Mat,
const int& lda,
int* ipiv);
};


template <typename T, typename Device>
struct lapack_getri {
void operator()(
const int& n,
T* Mat,
const int& lda,
const int* ipiv,
T* work,
const int& lwork);
};

template <typename T, typename Device>
struct lapack_getrs {
void operator()(
const char& trans,
const int& n,
const int& nrhs,
T* A,
const int& lda,
const int* ipiv,
T* B,
const int& ldb);
};

#if defined(__CUDA) || defined(__ROCM)
// TODO: Use C++ singleton to manage the GPU handles
void createGpuSolverHandle(); // create cusolver handle
Expand Down
Loading
Loading