Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] ABACUS LCAO-genelpa crash in large system #5983

Open
8 tasks
QuantumMisaka opened this issue Mar 9, 2025 · 0 comments
Open
8 tasks

[Bug] ABACUS LCAO-genelpa crash in large system #5983

QuantumMisaka opened this issue Mar 9, 2025 · 0 comments
Labels
Bugs Bugs that only solvable with sufficient knowledge of DFT Large Systems Issues related to large-size systems Performance Issues related to fail running ABACUS

Comments

@QuantumMisaka
Copy link
Collaborator

Describe the Code Quality Issue

In #5028, an issue related to ELPA is found that when dealing with large system (more than 1000 atoms), the scf will crash with :

==== backtrace (tid: 138369) ====
 0 0x0000000000012cf0 __funlockfile()  :0
 1 0x0000000000254159 elpa2_compute_mp_trans_ev_band_to_full_complex_double_()  /lustre/home/2201110432/apps/abacus/toolchain_used/toolchain-icx/build/elpa-2024.03.001/build_cpu/manually_preprocessed_.._src_elpa2_elpa2_compute.F90-src_elpa2_.libs_libelpa_openmp_private_la-elpa2_compute.o.F90:15626
 2 0x00000000003717aa elpa2_impl_mp_elpa_solve_evp_complex_2stage_a_h_a_double_impl_()  /lustre/home/2201110432/apps/abacus/toolchain_used/toolchain-icx/build/elpa-2024.03.001/build_cpu/manually_preprocessed_.._src_elpa2_elpa2.F90-src_elpa2_.libs_libelpa_openmp_private_la-elpa2.o.F90:6441
 3 0x00000000000c512f elpa_impl_mp_elpa_eigenvectors_a_h_a_dc_()  /lustre/home/2201110432/apps/abacus/toolchain_used/toolchain-icx/build/elpa-2024.03.001/build_cpu/manually_preprocessed_.._src_elpa_impl.F90-src_.libs_libelpa_openmp_private_la-elpa_impl.o.F90:5570
 4 0x00000000000c4709 elpa_eigenvectors_a_h_a_dc()  /lustre/home/2201110432/apps/abacus/toolchain_used/toolchain-icx/build/elpa-2024.03.001/build_cpu/manually_preprocessed_.._src_elpa_impl.F90-src_.libs_libelpa_openmp_private_la-elpa_impl.o.F90:5706
 5 0x0000000000bde2e2 elpa_eigenvectors()  /lustre/home/2201110432/lib/elpa/2024.03.001-icx/cpu/include/elpa/elpa_generic.h:82
 6 0x0000000000bde8ae ELPA_Solver::generalized_eigenvector()  /lustre/home/2201110432/apps/abacus/abacus-test/source/module_hsolver/genelpa/elpa_new_complex.cpp:130
 7 0x00000000007641c3 hsolver::DiagoElpa<std::complex<double> >::diag()  /lustre/home/2201110432/apps/abacus/abacus-test/source/module_hsolver/diago_elpa.cpp:90
 8 0x00000000007641c3 std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string()  /usr/lib/gcc/x86_64-redhat-linux/8/../../../../include/c++/8/bits/basic_string.h:519
 9 0x00000000007641c3 hsolver::DiagoElpa<std::complex<double> >::diag()  /lustre/home/2201110432/apps/abacus/abacus-test/source/module_hsolver/diago_elpa.cpp:95
10 0x000000000075c3d1 hsolver::HSolverLCAO<std::complex<double>, base_device::DEVICE_CPU>::hamiltSolvePsiK()  /lustre/home/2201110432/apps/abacus/abacus-test/source/module_hsolver/hsolver_lcao.cpp:149
11 0x000000000075c3d1 hsolver::HSolverLCAO<std::complex<double>, base_device::DEVICE_CPU>::hamiltSolvePsiK()  /lustre/home/2201110432/apps/abacus/abacus-test/source/module_hsolver/hsolver_lcao.cpp:150
12 0x000000000075a7d1 hsolver::HSolverLCAO<std::complex<double>, base_device::DEVICE_CPU>::solve()  /lustre/home/2201110432/apps/abacus/abacus-test/source/module_hsolver/hsolver_lcao.cpp:104
13 0x00000000008ba78f ModuleESolver::ESolver_KS_LCAO<std::complex<double>, double>::hamilt2density()  /lustre/home/2201110432/apps/abacus/abacus-test/source/module_esolver/esolver_ks_lcao.cpp:713
14 0x00000000008ba78f ???()  /usr/lib/gcc/x86_64-redhat-linux/8/../../../../include/c++/8/bits/basic_string.h:215
15 0x00000000008ba78f ???()  /usr/lib/gcc/x86_64-redhat-linux/8/../../../../include/c++/8/bits/basic_string.h:224
16 0x00000000008ba78f std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::~basic_string()  /usr/lib/gcc/x86_64-redhat-linux/8/../../../../include/c++/8/bits/basic_string.h:661
17 0x00000000008ba78f ModuleESolver::ESolver_KS_LCAO<std::complex<double>, double>::hamilt2density()  /lustre/home/2201110432/apps/abacus/abacus-test/source/module_esolver/esolver_ks_lcao.cpp:713
18 0x000000000085b0f9 ModuleESolver::ESolver_KS<std::complex<double>, base_device::DEVICE_CPU>::runner()  /lustre/home/2201110432/apps/abacus/abacus-test/source/module_esolver/esolver_ks.cpp:449
19 0x00000000006f9265 Relax_Driver::relax_driver()  /lustre/home/2201110432/apps/abacus/abacus-test/source/module_relax/relax_driver.cpp:49
20 0x000000000070f442 Driver::driver_run()  /lustre/home/2201110432/apps/abacus/abacus-test/source/driver_run.cpp:68
21 0x000000000070f442 Relax_Driver::~Relax_Driver()  /lustre/home/2201110432/apps/abacus/abacus-test/source/module_relax/relax_driver.h:14
22 0x000000000070f442 Driver::driver_run()  /lustre/home/2201110432/apps/abacus/abacus-test/source/driver_run.cpp:69
23 0x000000000070e665 Driver::atomic_world()  /lustre/home/2201110432/apps/abacus/abacus-test/source/driver.cpp:186
24 0x000000000070df5e Driver::init()  /lustre/home/2201110432/apps/abacus/abacus-test/source/driver.cpp:40
25 0x00000000004359e6 main()  ???:0
26 0x000000000003ad85 __libc_start_main()  ???:0
27 0x000000000043589e _start()  ???:0
=================================

User need to change to scalapack_gvx. so can we fix it ?

Also, does this preblem have relation with #5707 ?

Additional Context

No response

Task list for Issue attackers (only for developers)

  • Identify the specific code file or section with the code quality issue.
  • Investigate the issue and determine the root cause.
  • Research best practices and potential solutions for the identified issue.
  • Refactor the code to improve code quality, following the suggested solution.
  • Ensure the refactored code adheres to the project's coding standards.
  • Test the refactored code to ensure it functions as expected.
  • Update any relevant documentation, if necessary.
  • Submit a pull request with the refactored code and a description of the changes made.
@QuantumMisaka QuantumMisaka added Bugs Bugs that only solvable with sufficient knowledge of DFT Performance Issues related to fail running ABACUS Large Systems Issues related to large-size systems labels Mar 9, 2025
@QuantumMisaka QuantumMisaka changed the title [Quality] ABACUS LCAO-genelpa crash in large system [Bug] ABACUS LCAO-genelpa crash in large system Mar 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bugs Bugs that only solvable with sufficient knowledge of DFT Large Systems Issues related to large-size systems Performance Issues related to fail running ABACUS
Projects
None yet
Development

No branches or pull requests

1 participant