Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: aml spack test failed due to libcudart.so #225

Open
shahzebsiddiqui opened this issue Nov 7, 2023 · 1 comment
Open

[Bug]: aml spack test failed due to libcudart.so #225

shahzebsiddiqui opened this issue Nov 7, 2023 · 1 comment
Assignees
Labels
bug Something isn't working

Comments

@shahzebsiddiqui
Copy link
Contributor

CDASH Build

https://my.cdash.org/test/102913122

Link to buildspec file

https://github.com/buildtesters/buildtest-nersc/blob/devel/buildspecs/e4s/spack_test/perlmutter/23.05/aml.yml

Please describe the issue?

The error is in the following line where we cant find libcudart.so library

Command exited with status 127:
    './0_hello'
./0_hello: error while loading shared libraries: libcudart.so.11.0: cannot open shared object file: No such file or directory

Apparently some tests require the cuda runtime even though this build of aml is without cuda support

-- linux-sles15-zen3 / [email protected] -------------------------------
dzrvltdzrinvi5ps73jmxco3fsevwc2l [email protected]~cuda~hip~hwloc~opencl~ze build_system=autotools hip-platform=none  /global/common/software/spackecp/perlmutter/e4s-23.05/89639/spack/opt/spack/linux-sles15-zen3/gcc-11.2.0/aml-0.2.0-dzrvltdzrinvi5ps73jmxco3fsevwc2l

Relevant log output

_______________________________________________________________________________________________________
     The Extreme-Scale Scientific Software Stack (E4S) is accessible via the
Spack package manager.

     In order to access the production stack, you will need to load a spack
environment. Here are some tips to get started:


     'spack env list' - List all Spack environments
     'spack env activate gcc' - Activate the "gcc" Spack environment
     'spack env status' - Display the active Spack environment
     'spack load amrex' - Load the "amrex" Spack package into your user
environment

     For additional support, please refer to the following references:

       NERSC E4S Documentation: https://docs.nersc.gov/applications/e4s/
       E4S Documentation: https://e4s.readthedocs.io
       Spack Documentation: https://spack.readthedocs.io/en/latest/
       Spack Slack: https://spackpm.slack.com


______________________________________________________________________________________________________
     
==> Error: TestFailure: 1 test failed.


Command exited with status 127:
    './0_hello'
./0_hello: error while loading shared libraries: libcudart.so.11.0: cannot open shared object file: No such file or directory



1 error found in test log:
     3    ==> [2023-11-07-07:43:24.179560] test: test_check_tutorial: Compile and run the tutorial tests as install checks
     4    ==> [2023-11-07-07:43:24.183898] '/global/common/software/spackecp/perlmutter/e4s-23.05/89639/spack/lib/spack/env/gcc/gcc' '-o' '0_hello' '/global/ho
          mes/b/bdtest/.spack/test/rvdzxngt3yt5wumdyamqifh7kuv3mw3w/aml-0.2.0-dzrvltd/cache/aml/doc/tutorials/hello_world/0_hello.c' '-I/global/common/software
          /spackecp/perlmutter/e4s-23.05/89639/spack/opt/spack/linux-sles15-zen3/gcc-11.2.0/aml-0.2.0-dzrvltdzrinvi5ps73jmxco3fsevwc2l/include' '-I/global/comm
          on/software/spackecp/perlmutter/e4s-23.05/89639/spack/opt/spack/linux-sles15-zen3/gcc-11.2.0/numactl-2.0.14-thubjl4qwojk3icuocgn6uhmetkk4vkj/include'
           '-L/global/common/software/spackecp/perlmutter/e4s-23.05/89639/spack/opt/spack/linux-sles15-zen3/gcc-11.2.0/aml-0.2.0-dzrvltdzrinvi5ps73jmxco3fsevwc
          2l/lib' '-laml' '-lexcit' '-lpthread'
     5    ==> [2023-11-07-07:43:26.163540] './0_hello'
     6    ./0_hello: error while loading shared libraries: libcudart.so.11.0: cannot open shared object file: No such file or directory
     7    FAILED: Aml::test_check_tutorial: Command exited with status 127:
     8        './0_hello'
  >> 9    ./0_hello: error while loading shared libraries: libcudart.so.11.0: cannot open shared object file: No such file or directory
     10   
     11     File "/global/common/software/spackecp/perlmutter/e4s-23.05/89639/spack/bin/spack", line 54, in <module>
     12       sys.exit(main())
     13     File "/global/common/software/spackecp/perlmutter/e4s-23.05/89639/spack/lib/spack/spack_installable/main.py", line 37, in main
     14       sys.exit(spack.main.main(argv))
     15     File "/global/common/software/spackecp/perlmutter/e4s-23.05/89639/spack/lib/spack/spack/main.py", line 1018, in main


See test log for details:
  /global/homes/b/bdtest/.spack/test/rvdzxngt3yt5wumdyamqifh7kuv3mw3w/aml-0.2.0-dzrvltd-test-out.txt

==> Error: 1 test(s) in the suite failed.
@shahzebsiddiqui shahzebsiddiqui added the bug Something isn't working label Nov 7, 2023
@shahzebsiddiqui shahzebsiddiqui self-assigned this Nov 7, 2023
@shahzebsiddiqui
Copy link
Contributor Author

I think a potential workaround could be we could try loading the cudatoolkit library ml cudatoolkit/11.7 and see if that helps fix the issue. We should use the hardcoded version of cuda that was used to build the other packages with cuda support.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant