Skip to content

{bio}[foss/2023a] AlphaFold v2.3.2, dm-haiku v0.0.12, tensorstore v0.1.65 w/ CUDA v12.1.1#19942

Merged
boegel merged 23 commits intoeasybuilders:developfrom
ThomasHoffmann77:20240220124705_new_pr_AlphaFold232
Oct 11, 2024
Merged

{bio}[foss/2023a] AlphaFold v2.3.2, dm-haiku v0.0.12, tensorstore v0.1.65 w/ CUDA v12.1.1#19942
boegel merged 23 commits intoeasybuilders:developfrom
ThomasHoffmann77:20240220124705_new_pr_AlphaFold232

Conversation

@ThomasHoffmann77
Copy link
Copy Markdown
Contributor

@ThomasHoffmann77 ThomasHoffmann77 commented Feb 20, 2024

@ThomasHoffmann77 ThomasHoffmann77 changed the title {bio}[foss/2023a,GCCcore/12.3.0] AlphaFold v2.3.2, Kalign v3.4.0, dm-haiku v0.0.11, tensorstore v0.1.53 w/ CUDA v12.1.1 {bio}[foss/2023a,GCCcore/12.3.0] AlphaFold v2.3.2, Kalign v3.4.0, dm-haiku v0.0.11, tensorstore v0.1.53, OpenMM v8.0.0 HH-Suite v3.3.0 w/ CUDA v12.1.1 Feb 20, 2024
@ThomasHoffmann77 ThomasHoffmann77 changed the title {bio}[foss/2023a,GCCcore/12.3.0] AlphaFold v2.3.2, Kalign v3.4.0, dm-haiku v0.0.11, tensorstore v0.1.53, OpenMM v8.0.0 HH-Suite v3.3.0 w/ CUDA v12.1.1 {bio}[foss/2023a,GCCcore/12.3.0] AlphaFold v2.3.2, Kalign v3.4.0, dm-haiku v0.0.11, tensorstore v0.1.53, OpenMM v8.0.0, dm-tree v0.1.8, HH-Suite v3.3.0 w/ CUDA v12.1.1 Feb 20, 2024
@jfgrimm jfgrimm added this to the 4.x milestone Feb 22, 2024
@easybuilders easybuilders deleted a comment from boegelbot Feb 22, 2024
@ThomasHoffmann77 ThomasHoffmann77 changed the title {bio}[foss/2023a,GCCcore/12.3.0] AlphaFold v2.3.2, Kalign v3.4.0, dm-haiku v0.0.11, tensorstore v0.1.53, OpenMM v8.0.0, dm-tree v0.1.8, HH-Suite v3.3.0 w/ CUDA v12.1.1 {bio}[foss/2023a,GCCcore/12.3.0] AlphaFold v2.3.2, Kalign v3.4.0, dm-haiku v0.0.11, tensorstore v0.1.53, OpenMM v8.0.0, HH-Suite v3.3.0 w/ CUDA v12.1.1 Feb 28, 2024
@migueldiascosta
Copy link
Copy Markdown
Member

fwiw, I'm getting error: HWCAP_NEON was not declared in this scope when building OpenMM from this PR on NVIDIA Grace-Hopper

it looks like openmm-8.0.0/cmake_modules/TargetArch.cmake is detecting Grace-Hopper as arm instead of armv8,

which then leads to openmm-8.0.0/CMakeLists.txt setting -D__ARM__=1 instead of -D__ARM64__=1,

which in turn leads openmm-8.0.0/openmmapi/include/openmm/internal/vectorize_neon.h to use HWCAP_NEON instead of HWCAP_ASIMD

forcing TARGET_ARCH to be armv8 in openmm-8.0.0/CMakeLists.txt fixed the issue for me

@ThomasHoffmann77
Copy link
Copy Markdown
Contributor Author

fwiw, I'm getting error: HWCAP_NEON was not declared in this scope when building OpenMM from this PR on NVIDIA Grace-Hopper

it looks like openmm-8.0.0/cmake_modules/TargetArch.cmake is detecting Grace-Hopper as arm instead of armv8,

which then leads to openmm-8.0.0/CMakeLists.txt setting -D__ARM__=1 instead of -D__ARM64__=1,

which in turn leads openmm-8.0.0/openmmapi/include/openmm/internal/vectorize_neon.h to use HWCAP_NEON instead of HWCAP_ASIMD

forcing TARGET_ARCH to be armv8 in openmm-8.0.0/CMakeLists.txt fixed the issue for me

#18911

@ThomasHoffmann77 ThomasHoffmann77 changed the title {bio}[foss/2023a,GCCcore/12.3.0] AlphaFold v2.3.2, Kalign v3.4.0, dm-haiku v0.0.11, tensorstore v0.1.53, OpenMM v8.0.0, HH-Suite v3.3.0 w/ CUDA v12.1.1 {bio}[foss/2023a,GCCcore/12.3.0] AlphaFold v2.3.2, Kalign v3.4.0, dm-haiku v0.0.11, tensorstore v0.1.53, HH-Suite v3.3.0 w/ CUDA v12.1.1 Mar 4, 2024
@VRehnberg
Copy link
Copy Markdown
Contributor

Fyi, I've got a draft #20421 that might become relevant for this one as well. Perhaps you'll have opinions :).

Copy link
Copy Markdown
Contributor

@VRehnberg VRehnberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this.

So typical use would be(?):

  1. CPU only job to get features (gpu not detected)
  2. Job-array to GPUs to run predictions (features.pkl found and --only-model-pred="${SLURM_ARRAY_TASK_ID}")
  3. Single job with GPU to run relaxation (possibly in parallel, [How is this launched, or is it not run separately???])

@ThomasHoffmann77
Copy link
Copy Markdown
Contributor Author

ThomasHoffmann77 commented May 21, 2024

Thanks for adding this.

So typical use would be(?):

  1. CPU only job to get features (gpu not detected)
  2. Job-array to GPUs to run predictions (features.pkl found and --only-model-pred="${SLURM_ARRAY_TASK_ID}")

yes, for monomer jobs.
For multimer, you need to translate the array ID to X,Y with X in [1..5], Y in [0..4] (if you run with --num_multimer_predictions_per_model=5)

  1. Single job with GPU to run relaxation (possibly in parallel, [How is this launched, or is it not run separately???])

In order to get the ranking, you can run a quick CPU job after running the predictions.
--models_to_relax default is changed from best to none. Therefore the pipeline stops after the predictions.
You can resume with the relaxation by restarting with --models_to_relax=all (or best).

@ThomasHoffmann77 ThomasHoffmann77 force-pushed the 20240220124705_new_pr_AlphaFold232 branch from 0ebfd8f to e7367ff Compare July 24, 2024 10:04
@ThomasHoffmann77
Copy link
Copy Markdown
Contributor Author

accidentally closed

@ThomasHoffmann77 ThomasHoffmann77 marked this pull request as draft October 11, 2024 06:53
@ThomasHoffmann77 ThomasHoffmann77 changed the title {bio}[foss/2023a,GCCcore/12.3.0] AlphaFold v2.3.2, Kalign v3.4.0, dm-haiku v0.0.11, tensorstore v0.1.53, HH-Suite v3.3.0 w/ CUDA v12.1.1 {bio}[foss/2023a,GCCcore/12.3.0] AlphaFold v2.3.2, Kalign v3.4.0, dm-haiku v0.0.12, tensorstore v0.1.65, HH-Suite v3.3.0 w/ CUDA v12.1.1 Oct 11, 2024
@ThomasHoffmann77 ThomasHoffmann77 marked this pull request as ready for review October 11, 2024 14:10
@ThomasHoffmann77 ThomasHoffmann77 changed the title {bio}[foss/2023a,GCCcore/12.3.0] AlphaFold v2.3.2, Kalign v3.4.0, dm-haiku v0.0.12, tensorstore v0.1.65, HH-Suite v3.3.0 w/ CUDA v12.1.1 {bio}[foss/2023a,GCCcore/12.3.0] AlphaFold v2.3.2, dm-haiku v0.0.12, tensorstore v0.1.65 w/ CUDA v12.1.1 Oct 11, 2024
@ThomasHoffmann77 ThomasHoffmann77 changed the title {bio}[foss/2023a,GCCcore/12.3.0] AlphaFold v2.3.2, dm-haiku v0.0.12, tensorstore v0.1.65 w/ CUDA v12.1.1 {bio}[foss/2023a] AlphaFold v2.3.2, dm-haiku v0.0.12, tensorstore v0.1.65 w/ CUDA v12.1.1 Oct 11, 2024
@boegel boegel dismissed akesandgren’s stale review October 11, 2024 18:02

requested changes done

Copy link
Copy Markdown
Member

@boegel boegel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@boegel
Copy link
Copy Markdown
Member

boegel commented Oct 11, 2024

@boegelbot please test @ jsc-zen3-a100

@boegelbot
Copy link
Copy Markdown
Collaborator

@boegel: Request for testing this PR well received on jsczen3l1.int.jsc-zen3.fz-juelich.de

PR test command 'if [[ develop != 'develop' ]]; then EB_BRANCH=develop ./easybuild_develop.sh 2> /dev/null 1>&2; EB_PREFIX=/home/boegelbot/easybuild/develop source init_env_easybuild_develop.sh; fi; EB_PR=19942 EB_ARGS= EB_CONTAINER= EB_REPO=easybuild-easyconfigs EB_BRANCH=develop /opt/software/slurm/bin/sbatch --job-name test_PR_19942 --ntasks=8 --partition=jsczen3g --gres=gpu:1 ~/boegelbot/eb_from_pr_upload_jsc-zen3.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 5066

Test results coming soon (I hope)...

Details

- notification for comment with ID 2407894839 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

@boegel
Copy link
Copy Markdown
Member

boegel commented Oct 11, 2024

Test report by @boegel
SUCCESS
Build succeeded for 3 out of 3 (3 easyconfigs in total)
node3901.accelgor.os - Linux RHEL 8.8, x86_64, AMD EPYC 7413 24-Core Processor, 1 x NVIDIA NVIDIA A100-SXM4-80GB, 545.23.08, Python 3.6.8
See https://gist.github.com/boegel/cb450cdab5ed9c44eb1dd6e80e9541d2 for a full test report.

@boegelbot
Copy link
Copy Markdown
Collaborator

Test report by @boegelbot
SUCCESS
Build succeeded for 4 out of 4 (3 easyconfigs in total)
jsczen3g1.int.jsc-zen3.fz-juelich.de - Linux Rocky Linux 9.4, x86_64, AMD EPYC-Milan Processor (zen3), 1 x NVIDIA NVIDIA A100 80GB PCIe, 555.42.06, Python 3.9.18
See https://gist.github.com/boegelbot/c87f706d4ad650a5227d1ab0da8297d1 for a full test report.

@boegel boegel modified the milestones: 4.x, release after 4.9.4 Oct 11, 2024
@boegel
Copy link
Copy Markdown
Member

boegel commented Oct 11, 2024

Going in, thanks @ThomasHoffmann77!

@boegel boegel merged commit 96515d3 into easybuilders:develop Oct 11, 2024
@boegel boegel modified the milestones: release after 4.9.4, 5.0.0 Mar 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants