Skip to content

Conversation

@bedroge
Copy link
Contributor

@bedroge bedroge commented Feb 20, 2024

This solves a bug in the smcuda btl that causes MPI applications to crash or hang on Neoverse V1 CPUs, see open-mpi/ompi#12270. The issue was fixed in open-mpi/ompi#12344 for OpenMPI 4.1.x (4.1.7 should include the fix).

@bedroge
Copy link
Contributor Author

bedroge commented Feb 20, 2024

@boegelbot please test @ generoso

@boegelbot
Copy link
Collaborator

@bedroge: Request for testing this PR well received on login1

PR test command 'EB_PR=19940 EB_ARGS= EB_CONTAINER= EB_REPO=easybuild-easyconfigs /opt/software/slurm/bin/sbatch --job-name test_PR_19940 --ntasks=4 ~/boegelbot/eb_from_pr_upload_generoso.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 12926

Test results coming soon (I hope)...

Details

- notification for comment with ID 1953959787 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

@bedroge
Copy link
Contributor Author

bedroge commented Feb 20, 2024

@boegelbot please test @ jsc-zen3

@boegelbot
Copy link
Collaborator

@bedroge: Request for testing this PR well received on jsczen3l1.int.jsc-zen3.fz-juelich.de

PR test command 'if [[ develop != 'develop' ]]; then EB_BRANCH=develop ./easybuild_develop.sh 2> /dev/null 1>&2; EB_PREFIX=/home/boegelbot/easybuild/develop source init_env_easybuild_develop.sh; fi; EB_PR=19940 EB_ARGS= EB_CONTAINER= EB_REPO=easybuild-easyconfigs EB_BRANCH=develop /opt/software/slurm/bin/sbatch --job-name test_PR_19940 --ntasks=8 ~/boegelbot/eb_from_pr_upload_jsc-zen3.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 3642

Test results coming soon (I hope)...

Details

- notification for comment with ID 1954126426 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

@boegelbot
Copy link
Collaborator

Test report by @boegelbot
SUCCESS
Build succeeded for 14 out of 14 (14 easyconfigs in total)
cns1 - Linux Rocky Linux 8.9, x86_64, Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz (haswell), Python 3.6.8
See https://gist.github.com/boegelbot/ef2751a131a3499141f02d45a7eb29aa for a full test report.

@boegelbot
Copy link
Collaborator

Test report by @boegelbot
FAILED
Build succeeded for 15 out of 16 (14 easyconfigs in total)
jsczen3c1.int.jsc-zen3.fz-juelich.de - Linux Rocky Linux 9.3, x86_64, AMD EPYC-Milan Processor (zen3), Python 3.9.18
See https://gist.github.com/boegelbot/7b076361a2cf4873b7486a6b07a4ed4a for a full test report.

@boegel boegel added this to the release after 4.9.0 milestone Feb 20, 2024
@bedroge
Copy link
Contributor Author

bedroge commented Feb 21, 2024

Test report by @boegelbot
FAILED
Build succeeded for 15 out of 16 (14 easyconfigs in total)
jsczen3c1.int.jsc-zen3.fz-juelich.de - Linux Rocky Linux 9.3, x86_64, AMD EPYC-Milan Processor (zen3), Python 3.9.18
See https://gist.github.com/boegelbot/7b076361a2cf4873b7486a6b07a4ed4a for a full test report.

They all succeeded except OpenMPI-4.1.1-intel-compilers-2021.4.0.eb:

============================================================================
== Configuring Open MPI
============================================================================

*** Startup tests
checking build system type... x86_64-pc-linux-gnu
checking host system type... x86_64-pc-linux-gnu
checking target system type... x86_64-pc-linux-gnu
checking for x86_64-pc-linux-gnu-gcc... icc
checking whether the C compiler works... yes
checking for C compiler default output file name... a.out
checking for suffix of executables... 
checking whether we are cross compiling... configure: error: in `/tmp/boegelbot/OpenMPI/4.1.1/intel-compilers-2021.4.0/openmpi-4.1.1':
configure: error: cannot run C compiled programs.
If you meant to cross compile, use `--host'.
See `config.log' for more details
 (at easybuild/easybuild-framework/easybuild/tools/run.py:682 in parse_cmd_output)

Not sure what's wrong here...

@boegel
Copy link
Member

boegel commented Feb 21, 2024

Test report by @boegel
SUCCESS
Build succeeded for 14 out of 14 (14 easyconfigs in total)
node3106.skitty.os - Linux RHEL 8.8, x86_64, Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz, Python 3.6.8
See https://gist.github.com/boegel/fb69197b37e2604f56a805224de670e5 for a full test report.

@boegel
Copy link
Member

boegel commented Feb 21, 2024

Test report by @boegelbot
FAILED
Build succeeded for 15 out of 16 (14 easyconfigs in total)
jsczen3c1.int.jsc-zen3.fz-juelich.de - Linux Rocky Linux 9.3, x86_64, AMD EPYC-Milan Processor (zen3), Python 3.9.18
See https://gist.github.com/boegelbot/7b076361a2cf4873b7486a6b07a4ed4a for a full test report.

They all succeeded except OpenMPI-4.1.1-intel-compilers-2021.4.0.eb:

============================================================================
== Configuring Open MPI
============================================================================

*** Startup tests
checking build system type... x86_64-pc-linux-gnu
checking host system type... x86_64-pc-linux-gnu
checking target system type... x86_64-pc-linux-gnu
checking for x86_64-pc-linux-gnu-gcc... icc
checking whether the C compiler works... yes
checking for C compiler default output file name... a.out
checking for suffix of executables... 
checking whether we are cross compiling... configure: error: in `/tmp/boegelbot/OpenMPI/4.1.1/intel-compilers-2021.4.0/openmpi-4.1.1':
configure: error: cannot run C compiled programs.
If you meant to cross compile, use `--host'.
See `config.log' for more details
 (at easybuild/easybuild-framework/easybuild/tools/run.py:682 in parse_cmd_output)

Not sure what's wrong here...

It's related to RHEL9 and the newer glibc:

In file included from conftest.c(10):
/usr/include/stdio.h(824): error: attribute "__malloc__" does not take arguments
    __attribute_malloc__ __attr_dealloc (pclose, 1) __wur;
                         ^

compilation aborted for conftest.c (code 2)
configure:6725: $? = 2
configure:6732: ./conftest
./configure: line 6734: ./conftest: No such file or directory

It seems like that Intel compilers version is basically not compatible with RHEL 9.

I won't let that block this PR, the problem is not caused by the patch being added.

Copy link
Member

@boegel boegel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@boegel
Copy link
Member

boegel commented Feb 21, 2024

Going in, thanks @bedroge!

@boegel boegel merged commit 01084d1 into easybuilders:develop Feb 21, 2024
@bedroge bedroge deleted the openmpi_4.1.x_add_wmb branch February 21, 2024 20:04
@bedroge bedroge added the EESSI Related to EESSI project label Feb 21, 2024
@boegel boegel added the aarch64 Related to Arm 64-bit (aarch64) label Feb 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

aarch64 Related to Arm 64-bit (aarch64) bug fix EESSI Related to EESSI project

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants