-
Notifications
You must be signed in to change notification settings - Fork 139
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RT-TDDFT GPU Acceleration: RT-TD now fully support GPU computation #5773
RT-TDDFT GPU Acceleration: RT-TD now fully support GPU computation #5773
Conversation
The current program has some bugs that cause the data in Useful information:
|
…assignment operator overload) instead
Tensor
Tensor
on CPU and refactoring linear algebra operations in TDDFT
…pport for Tensor on CPU and refactoring linear algebra operations in TDDFT
LGTM👍, a good example showing the possibility of using tensor. |
…velop into TDDFT_GPU_phase_1
…er code organization
…s it as an input parameter instead
Phase 1: Rewriting existing code using
Tensor
(complete)This is merely a draft and does not represent the final code. Since
Tensor
can effectively support heterogeneous computing, the goal of the first phase is to rewrite the existing algorithms usingTensor
. Currently, all memory is still explicitly allocated on the CPU (the parameter of theTensor
constructor iscontainer::DeviceType::CpuDevice
).Phase 2: Adding needed BLAS and LAPACK support for
Tensor
on CPU and refactoring linear algebra operations in TDDFT (complete)Key Changes:
lapack_getrf
andlapack_getri
inmodule_base/module_container/ATen/kernels/lapack.h
to support matrix LU factorization (getrf
) and matrix inversion (getri
) operations forTensor
objects.zgetrf_
andzgetri_
) declarations inmodule_base/lapack_connector.h
to comply with standard conventions.Tensor
operations in TDDFT. These linear algebra operations incontainer::kernels
module frommodule_base/module_container/ATen
include aDevice
parameter, enabling seamless support for heterogeneous computing (GPU acceleration in future phases).Phase 3: RT-TDDFT GPU acceleration core algorithm (complete)
Added linear solver interfaces:
getrs
) using LAPACK.getrf
) and linear solver (getrs
) using cuSOLVER.Refactored RT-TDDFT I/O and parameters:
td_force_dt
,td_vext
,td_vext_dire_case
,out_dipole
,out_efield
) from theEvolve_elec
class.PARAM.inp
input interface to simplify template class usage withDevice
parameter.Heterogeneous computing support:
Device
template parameter to RT-TDDFT core algorithm classes and functions.base_device::memory::synchronize_memory_op
) to ensure proper data handling across devices.BlasConnector::copy
operations with memory synchronization functions.GPU acceleration for RT-TDDFT:
Phase 4: MPI multi-process compatibility (complete)
ctx
parameters in memory synchronization operations.device=gpu
.