Skip to content

{2023.06}[2023a,sapphire_rapids] PyTorch 2.1.2#882

Merged
boegel merged 4 commits intoEESSI:2023.06-software.eessi.iofrom
bedroge:sapphire_rapids_pytorch_212
Jan 24, 2025
Merged

{2023.06}[2023a,sapphire_rapids] PyTorch 2.1.2#882
boegel merged 4 commits intoEESSI:2023.06-software.eessi.iofrom
bedroge:sapphire_rapids_pytorch_212

Conversation

@bedroge
Copy link
Copy Markdown
Collaborator

@bedroge bedroge commented Jan 23, 2025

Initially I was going to try to use an updated easyconfig for Z3, but looking at it again I don't think it will help. Our PyTorch is still using the Z3 with a Python suffix, so rebuilding Z3 based on easybuilders/easybuild-easyconfigs#20050 will probably not do much. Instead, I've just increased the maximum number of failed tests to 4 to work around the issue described at #875 (comment).

@eessi-bot
Copy link
Copy Markdown

eessi-bot bot commented Jan 23, 2025

Instance eessi-bot-mc-aws is configured to build for:

  • architectures: x86_64/generic, x86_64/intel/haswell, x86_64/intel/sapphire_rapids, x86_64/intel/skylake_avx512, x86_64/amd/zen2, x86_64/amd/zen3, aarch64/generic, aarch64/neoverse_n1, aarch64/neoverse_v1
  • repositories: eessi.io-2023.06-software, eessi.io-2023.06-compat

@eessi-bot
Copy link
Copy Markdown

eessi-bot bot commented Jan 23, 2025

Instance eessi-bot-mc-azure is configured to build for:

  • architectures: x86_64/amd/zen4
  • repositories: eessi.io-2023.06-software, eessi.io-2023.06-compat

@bedroge
Copy link
Copy Markdown
Collaborator Author

bedroge commented Jan 23, 2025

bot: build repo:eessi.io-2023.06-software arch:x86_64/intel/sapphire_rapids

@eessi-bot
Copy link
Copy Markdown

eessi-bot bot commented Jan 23, 2025

Updates by the bot instance eessi-bot-mc-aws (click for details)
  • received bot command build repo:eessi.io-2023.06-software arch:x86_64/intel/sapphire_rapids from bedroge

    • expanded format: build repository:eessi.io-2023.06-software architecture:x86_64/intel/sapphire_rapids
  • handling command build repository:eessi.io-2023.06-software architecture:x86_64/intel/sapphire_rapids resulted in:

@eessi-bot
Copy link
Copy Markdown

eessi-bot bot commented Jan 23, 2025

Updates by the bot instance eessi-bot-mc-azure (click for details)
  • received bot command build repo:eessi.io-2023.06-software arch:x86_64/intel/sapphire_rapids from bedroge

    • expanded format: build repository:eessi.io-2023.06-software architecture:x86_64/intel/sapphire_rapids
  • handling command build repository:eessi.io-2023.06-software architecture:x86_64/intel/sapphire_rapids resulted in:

    • no jobs were submitted

@eessi-bot
Copy link
Copy Markdown

eessi-bot bot commented Jan 23, 2025

New job on instance eessi-bot-mc-aws for CPU micro-architecture x86_64-intel-sapphire_rapids for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2025.01/pr_882/42206

date job status comment
Jan 23 15:33:42 UTC 2025 submitted job id 42206 awaits release by job manager
Jan 23 15:34:40 UTC 2025 released job awaits launch by Slurm scheduler
Jan 23 15:40:50 UTC 2025 running job 42206 is running
Jan 24 01:57:36 UTC 2025 finished
😁 SUCCESS (click triangle for details)
Details
✅ job output file slurm-42206.out
✅ no message matching FATAL:
✅ no message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-x86_64-intel-sapphire_rapids-1737683307.tar.gzsize: 141 MiB (148698861 bytes)
entries: 12727
modules under 2023.06/software/linux/x86_64/intel/sapphire_rapids/modules/all
PyTorch/2.1.2-foss-2023a.lua
Z3/4.12.2-GCCcore-12.3.0.lua
software under 2023.06/software/linux/x86_64/intel/sapphire_rapids/software
PyTorch/2.1.2-foss-2023a
Z3/4.12.2-GCCcore-12.3.0
other under 2023.06/software/linux/x86_64/intel/sapphire_rapids
2023.06/init/easybuild/eb_hooks.py
Jan 24 01:57:36 UTC 2025 test result
😁 SUCCESS (click triangle for details)
ReFrame Summary
[ OK ] (1/8) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_allreduce %module_name=OSU-Micro-Benchmarks/7.2-gompi-2023b %scale=1_node %device_type=cpu /775175bf @BotBuildTests:x86-64-intel-srapids-node+default
P: latency: 1.73 us (r:0, l:None, u:None)
[ OK ] (2/8) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_allreduce %module_name=OSU-Micro-Benchmarks/7.1-1-gompi-2023a %scale=1_node %device_type=cpu /52707c40 @BotBuildTests:x86-64-intel-srapids-node+default
P: latency: 1.74 us (r:0, l:None, u:None)
[ OK ] (3/8) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_alltoall %module_name=OSU-Micro-Benchmarks/7.2-gompi-2023b %scale=1_node %device_type=cpu /b1aacda9 @BotBuildTests:x86-64-intel-srapids-node+default
P: latency: 3.88 us (r:0, l:None, u:None)
[ OK ] (4/8) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_alltoall %module_name=OSU-Micro-Benchmarks/7.1-1-gompi-2023a %scale=1_node %device_type=cpu /c6bad193 @BotBuildTests:x86-64-intel-srapids-node+default
P: latency: 4.25 us (r:0, l:None, u:None)
[ OK ] (5/8) EESSI_OSU_pt2pt_CPU %benchmark_info=mpi.pt2pt.osu_latency %module_name=OSU-Micro-Benchmarks/7.2-gompi-2023b %scale=1_node /15cad6c4 @BotBuildTests:x86-64-intel-srapids-node+default
P: latency: 0.42 us (r:0, l:None, u:None)
[ OK ] (6/8) EESSI_OSU_pt2pt_CPU %benchmark_info=mpi.pt2pt.osu_latency %module_name=OSU-Micro-Benchmarks/7.1-1-gompi-2023a %scale=1_node /6672deda @BotBuildTests:x86-64-intel-srapids-node+default
P: latency: 0.36 us (r:0, l:None, u:None)
[ OK ] (7/8) EESSI_OSU_pt2pt_CPU %benchmark_info=mpi.pt2pt.osu_bw %module_name=OSU-Micro-Benchmarks/7.2-gompi-2023b %scale=1_node /2a9a47b1 @BotBuildTests:x86-64-intel-srapids-node+default
P: bandwidth: 13515.75 MB/s (r:0, l:None, u:None)
[ OK ] (8/8) EESSI_OSU_pt2pt_CPU %benchmark_info=mpi.pt2pt.osu_bw %module_name=OSU-Micro-Benchmarks/7.1-1-gompi-2023a %scale=1_node /1b24ab8e @BotBuildTests:x86-64-intel-srapids-node+default
P: bandwidth: 13552.21 MB/s (r:0, l:None, u:None)
[ PASSED ] Ran 8/8 test case(s) from 8 check(s) (0 failure(s), 0 skipped, 0 aborted)
Details
✅ job output file slurm-42206.out
✅ no message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case
Jan 24 10:51:16 UTC 2025 uploaded transfer of eessi-2023.06-software-linux-x86_64-intel-sapphire_rapids-1737683307.tar.gz to S3 bucket succeeded

@bedroge bedroge added ready-to-deploy Mark a PR as ready to deploy 2023.06-software.eessi.io 2023.06 version of software.eessi.io sapphirerapids labels Jan 24, 2025
@boegel
Copy link
Copy Markdown
Contributor

boegel commented Jan 24, 2025

@bedroge Can you also update #461 + eessi-2023.06-known-issues.yml accordingly?

Maybe mention which tests are failing in the issue, so we have some reference info

@bedroge
Copy link
Copy Markdown
Collaborator Author

bedroge commented Jan 24, 2025

@bedroge Can you also update #461 + eessi-2023.06-known-issues.yml accordingly?

Maybe mention which tests are failing in the issue, so we have some reference info

Done, see a2fc9e7 and #461 (comment).

@boegel boegel added bot:deploy Ask bot to deploy missing software installations to EESSI and removed ready-to-deploy Mark a PR as ready to deploy labels Jan 24, 2025
Copy link
Copy Markdown
Contributor

@boegel boegel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@boegel boegel merged commit 152414b into EESSI:2023.06-software.eessi.io Jan 24, 2025
@eessi-bot
Copy link
Copy Markdown

eessi-bot bot commented Jan 24, 2025

PR merged! Moved ['/project/def-users/SHARED/jobs/2025.01/pr_882/42206'] to /project/def-users/SHARED/trash_bin/EESSI/software-layer/2025.01.24

@eessi-bot
Copy link
Copy Markdown

eessi-bot bot commented Jan 24, 2025

PR merged! Moved [] to /project/def-users/SHARED/trash_bin/EESSI/software-layer/2025.01.24

@bedroge bedroge deleted the sapphire_rapids_pytorch_212 branch January 24, 2025 12:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

2023.06-software.eessi.io 2023.06 version of software.eessi.io bot:deploy Ask bot to deploy missing software installations to EESSI sapphirerapids

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants