Skip to content

{bio}[foss/2022a] RFdiffusion v1.1.0 w/ CUDA 11.7.0#23110

Closed
ThomasHoffmann77 wants to merge 20 commits intoeasybuilders:developfrom
ThomasHoffmann77:20250616163944_new_pr_RFdiffusion110
Closed

{bio}[foss/2022a] RFdiffusion v1.1.0 w/ CUDA 11.7.0#23110
ThomasHoffmann77 wants to merge 20 commits intoeasybuilders:developfrom
ThomasHoffmann77:20250616163944_new_pr_RFdiffusion110

Conversation

@ThomasHoffmann77
Copy link
Copy Markdown
Contributor

@ThomasHoffmann77 ThomasHoffmann77 commented Jun 16, 2025

(created using eb --new-pr)

reopen #18359

requires:

Copy link
Copy Markdown
Contributor

@lexming lexming left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine, there is just a minor change

@github-actions
Copy link
Copy Markdown

github-actions bot commented Jun 16, 2025

Updated software DGL-1.1.1-foss-2022a-CUDA-11.7.0.eb

Diff against DGL-1.1.3-foss-2023a-CUDA-12.1.1.eb

easybuild/easyconfigs/d/DGL/DGL-1.1.3-foss-2023a-CUDA-12.1.1.eb

diff --git a/easybuild/easyconfigs/d/DGL/DGL-1.1.3-foss-2023a-CUDA-12.1.1.eb b/easybuild/easyconfigs/d/DGL/DGL-1.1.1-foss-2022a-CUDA-11.7.0.eb
index 2dfef8d19f..7417fbbf36 100644
--- a/easybuild/easyconfigs/d/DGL/DGL-1.1.3-foss-2023a-CUDA-12.1.1.eb
+++ b/easybuild/easyconfigs/d/DGL/DGL-1.1.1-foss-2022a-CUDA-11.7.0.eb
@@ -1,12 +1,7 @@
-# based on DGL-0.9.1-foss-2021a-CUDA-11.3.1 and DGL-1.1.3-foss-2022a-CUDA-11.7.0
-# GKlib-METIS added as module as third-party approach does not build
-# libxsmm set to 'off' so it is using the EasyBuild module
-# update of https://github.com/easybuilders/easybuild-easyconfigs/pull/20092
-
 easyblock = 'CMakeMake'
 
 name = 'DGL'
-version = '1.1.3'
+version = '1.1.1'
 versionsuffix = '-CUDA-%(cudaver)s'
 
 homepage = 'https://www.dgl.ai'
@@ -14,30 +9,29 @@ description = """DGL is an easy-to-use, high performance and scalable Python pac
 DGL is framework agnostic, meaning if a deep graph model is a component of an end-to-end application, the rest
 of the logics can be implemented in any major frameworks, such as PyTorch, Apache MXNet or TensorFlow."""
 
-toolchain = {'name': 'foss', 'version': '2023a'}
+toolchain = {'name': 'foss', 'version': '2022a'}
+# GCC 10.3.0 vectorizer causes errors in nanoflann on skylake and later
+# and since nanoflann is just a header file we need to turn it off for anything that uses it
+#  TH?  toolchainopts = {'vectorize': False}
 
 github_account = 'dmlc'
 source_urls = [GITHUB_LOWER_SOURCE]
 sources = [
     {
-        'download_filename': 'v%(version)s.tar.gz',
+        'download_filename': '%(version)s.tar.gz',
         'filename': '%(namelower)s-%(version)s.tar.gz',
     },
     {
         'source_urls': ['https://github.com/KarypisLab/METIS/archive'],
-        'download_filename': 'v5.2.1.tar.gz',
-        'filename': 'metis-5.2.1.tar.gz',
+        'download_filename': 'v5.1.1-DistDGL-v0.5.tar.gz',
+        'filename': 'metis-5.1.1-DistDGL-v0.5.tar.gz',
         'extract_cmd': "tar -C %(namelower)s-%(version)s/third_party/METIS --strip-components=1 -xf %s",
     },
     {
-        'filename': 'pcg-cpp-428802d.tar.xz',
-        'git_config': {
-            'url': 'https://github.com/imneme',
-            'repo_name': 'pcg-cpp',
-            'commit': '428802d1a5634f96bcd0705fab379ff0113bcf13',
-            'recursive': True,
-        },
-        'extract_cmd': "tar -C %(namelower)s-%(version)s/third_party/pcg --strip-components=1 -xf %s",
+        'source_urls': ['https://github.com/KarypisLab/GKlib/archive'],
+        'download_filename': 'METIS-v5.1.1-DistDGL-0.5.tar.gz',
+        'filename': 'GKlib-METIS-v5.1.1-DistDGL-0.5.tar.gz',
+        'extract_cmd': "tar -C %(namelower)s-%(version)s/third_party/METIS/GKlib --strip-components=2 -xf %s",
     },
     {
         'filename': 'tensorpipe-20230206.tar.xz',
@@ -50,64 +44,88 @@ sources = [
         'extract_cmd': "tar -C %(namelower)s-%(version)s/third_party/tensorpipe --strip-components=1 -xf %s",
     },
     {
-        'filename': 'cccl-c4eda1a.tar.xz',
+        'filename': 'pcg-20220408.tar.xz',
         'git_config': {
-            'url': 'https://github.com/NVIDIA',
-            'repo_name': 'cccl',
-            'commit': 'c4eda1aea304c012270dbd10235e60eaf47bd06f',
+            'url': 'https://github.com/imneme',
+            'repo_name': 'pcg-cpp',
+            'commit': '428802d1a5634f96bcd0705fab379ff0113bcf13',
             'recursive': True,
         },
-        'extract_cmd': "tar -C %(namelower)s-%(version)s/third_party/cccl --strip-components=1 -xf %s",
+        'extract_cmd': "tar -C %(namelower)s-%(version)s/third_party/pcg --strip-components=1 -xf %s",
     },
+    # requires commit 8009060 due to LIBXSMM_MELTW_FLAG_OPREDUCE_VECS_REDOP_MAX
+    #   -> do not use outdated dependency ver 1.17
     {
-        'filename': 'nanoflann-4c47ca2.tar.xz',
+        'filename': 'libxsmm-20230504.tar.xz',
         'git_config': {
-            'url': 'https://github.com/jlblancoc',
-            'repo_name': 'nanoflann',
-            'commit': '4c47ca200209550c5628c89803591f8a753c8181',
+            'url': 'https://github.com/libxsmm',
+            'repo_name': 'libxsmm',
+            'commit': '80090603e43f6ddc870cc42e1403dd0af07744cc',
             'recursive': True,
         },
-        'extract_cmd': "tar -C %(namelower)s-%(version)s/third_party/nanoflann --strip-components=1 -xf %s",
+        'extract_cmd': "tar -C %(namelower)s-%(version)s/third_party/libxsmm --strip-components=1 -xf %s",
+    },
+    # DGL really needs cub >= 1.17, CUDA 11.7 only have 1.15
+    {
+        'source_urls': ['https://github.com/NVIDIA/thrust/archive'],
+        'download_filename': '1.17.0.tar.gz',
+        'filename': 'thrust-1.17.0.tar.gz',
+        'extract_cmd': "tar -C %(namelower)s-%(version)s/third_party/thrust --strip-components=1 -xf %s",
+    },
+    {
+        'source_urls': ['https://github.com/NVIDIA/cub/archive'],
+        'download_filename': '1.17.0.tar.gz',
+        'filename': 'cub-1.17.0.tar.gz',
+        'extract_cmd':
+            "tar -C %(namelower)s-%(version)s/third_party/thrust/dependencies/cub --strip-components=1 -xf %s",
     },
 ]
 patches = [
     '%(name)s-%(version)s_use_externals_instead_of_submodules.patch',
+    '%(name)s-%(version)s_nanoflanSearchParams.patch'
 ]
 checksums = [
-    {'dgl-1.1.3.tar.gz': 'c45021d77ff2b1fed814a8b91260671167fb4e42b7d5fab2d37faa74ae1dc5b4'},
-    {'metis-5.2.1.tar.gz': '1a4665b2cd07edc2f734e30d7460afb19c1217c2547c2ac7bf6e1848d50aff7a'},
-    {'pcg-cpp-428802d.tar.xz': 'c4633dd4ba3d8ca8170756905dc757464e6798407ebf8bee166226446fb86e60'},
+    {'dgl-1.1.1.tar.gz': '076026a7818f2396056252269d7960c561840bb0fe89494e11f8d6c524891c49'},
+    {'metis-5.1.1-DistDGL-v0.5.tar.gz': 'cedf0b32d32a8496bac7eb078b2b8260fb00ddb8d50c27e4082968a01bc33331'},
+    {'GKlib-METIS-v5.1.1-DistDGL-0.5.tar.gz': '52aa0d383d42360f4faa0ae9537ba2ca348eeab4db5f2dfd6343192d0ff4b833'},
     {'tensorpipe-20230206.tar.xz': '85203179f7970f4274a9f90452616ec534b1f54a87040fc98786d035efb429e4'},
-    {'cccl-c4eda1a.tar.xz': '406298efb6d0cb523c71b076695bddb82a4c93fec67614df582bbb13da0ca92d'},
-    {'nanoflann-4c47ca2.tar.xz': '2a23714197f5b2447d7c04df966c4e6a4010e5a649bfbbf8f2a7d0b30712660c'},
-    {'DGL-1.1.3_use_externals_instead_of_submodules.patch':
-     '89a89f8e540824ce483fbaf1750babf9d40826e40763a899d84c753d9ba18c20'},
+    {'pcg-20220408.tar.xz': 'c4633dd4ba3d8ca8170756905dc757464e6798407ebf8bee166226446fb86e60'},
+    {'libxsmm-20230504.tar.xz': '3539da609a4822b60a1dc84ba54f9872849ed46947274da50b6cd4bcc97b353e'},
+    {'thrust-1.17.0.tar.gz': 'b02aca5d2325e9128ed9d46785b8e72366f758b873b95001f905f22afcf31bbf'},
+    {'cub-1.17.0.tar.gz': '16fd4860ae3196bc3eb08bf5754fa2a9697951ddae36dc9721e6614388893618'},
+    {'DGL-1.1.1_use_externals_instead_of_submodules.patch':
+     'b98bb1b90ea5cb64e492fee3b2390eff4d9901a37c954c39326ce667a0ecdd8e'},
+    {'DGL-1.1.1_nanoflanSearchParams.patch': '82c1414116a95d4efaf99d4025d223af97a4fb87aac471e60a008618edae38bb'},
 ]
 
 builddependencies = [
-    ('CMake', '3.26.3'),
-    ('googletest', '1.13.0'),
+    ('CMake', '3.24.3'),
+    ('googletest', '1.11.0'),
 ]
 
 dependencies = [
-    ('Python', '3.11.3'),
-    ('SciPy-bundle', '2023.07'),
-    ('networkx', '3.1'),
-    ('tqdm', '4.66.1'),
+    ('Python', '3.10.4'),
+    ('SciPy-bundle', '2022.05'),
+    ('networkx', '2.8.4'),
+    ('tqdm', '4.64.0'),
     ('DLPack', '0.8'),
     ('DMLC-Core', '0.5'),
     ('Parallel-Hashmap', '1.36'),
-    ('CUDA', '12.1.1', '', SYSTEM),
-    ('NCCL', '2.18.3', versionsuffix),
-    ('PyTorch', '2.1.2', versionsuffix),
-    ('libxsmm', '1.17'),
-    ('GKlib-METIS', '5.1.1'),
+    ('nanoflann', '1.5.0'),
+    # ('libxsmm', '1.17'),  # requires commit 8009060 due to LIBXSMM_MELTW_FLAG_OPREDUCE_VECS_REDOP_MAX
+    ('CUDA', '11.7.0', '', SYSTEM),
+    ('NCCL', '2.12.12', versionsuffix),
+    ('PyTorch', '1.12.0', versionsuffix),
 ]
 
 _copts = [
+    '-DUSE_AVX=OFF',  # AVX + LIBXSMM requires libxsmm tag 1.eol
     '-DBUILD_CPP_TEST=ON',
     '-DUSE_CUDA=ON',  # Must be "ON", as opposed to "1" or so, due to bad CMake code in DGL
-    '-DUSE_LIBXSMM=OFF',
+    '-DUSE_NCCL=ON',
+    '-DUSE_SYSTEM_NCCL=ON',
+    '-DBUILD_WITH_SHARED_NCCL=ON',
+    '-DUSE_FP16=ON',
 ]
 configopts = ' '.join(_copts)
 
@@ -120,13 +138,16 @@ runtest = 'test'
 
 exts_defaultclass = 'PythonPackage'
 exts_default_options = {
+    'easyblock': 'PythonPackage',
     'runtest': True,
 }
+
 exts_list = [
     ('dgl', version, {
+        'installopts': "--use-feature=in-tree-build ",
         'source_tmpl': '%(namelower)s-%(version)s.tar.gz',
         'start_dir': 'python',
-        'checksums': ['c45021d77ff2b1fed814a8b91260671167fb4e42b7d5fab2d37faa74ae1dc5b4'],
+        'checksums': ['076026a7818f2396056252269d7960c561840bb0fe89494e11f8d6c524891c49'],
     }),
 ]
 
Diff against DGL-0.9.1-foss-2021a-CUDA-11.3.1.eb

easybuild/easyconfigs/d/DGL/DGL-0.9.1-foss-2021a-CUDA-11.3.1.eb

diff --git a/easybuild/easyconfigs/d/DGL/DGL-0.9.1-foss-2021a-CUDA-11.3.1.eb b/easybuild/easyconfigs/d/DGL/DGL-1.1.1-foss-2022a-CUDA-11.7.0.eb
index 4eca7e92af..7417fbbf36 100644
--- a/easybuild/easyconfigs/d/DGL/DGL-0.9.1-foss-2021a-CUDA-11.3.1.eb
+++ b/easybuild/easyconfigs/d/DGL/DGL-1.1.1-foss-2022a-CUDA-11.7.0.eb
@@ -1,7 +1,7 @@
 easyblock = 'CMakeMake'
 
 name = 'DGL'
-version = '0.9.1'
+version = '1.1.1'
 versionsuffix = '-CUDA-%(cudaver)s'
 
 homepage = 'https://www.dgl.ai'
@@ -9,17 +9,17 @@ description = """DGL is an easy-to-use, high performance and scalable Python pac
 DGL is framework agnostic, meaning if a deep graph model is a component of an end-to-end application, the rest
 of the logics can be implemented in any major frameworks, such as PyTorch, Apache MXNet or TensorFlow."""
 
-toolchain = {'name': 'foss', 'version': '2021a'}
+toolchain = {'name': 'foss', 'version': '2022a'}
 # GCC 10.3.0 vectorizer causes errors in nanoflann on skylake and later
 # and since nanoflann is just a header file we need to turn it off for anything that uses it
-toolchainopts = {'vectorize': False}
+#  TH?  toolchainopts = {'vectorize': False}
 
 github_account = 'dmlc'
 source_urls = [GITHUB_LOWER_SOURCE]
 sources = [
     {
         'download_filename': '%(version)s.tar.gz',
-        'filename': SOURCELOWER_TAR_GZ,
+        'filename': '%(namelower)s-%(version)s.tar.gz',
     },
     {
         'source_urls': ['https://github.com/KarypisLab/METIS/archive'],
@@ -31,7 +31,7 @@ sources = [
         'source_urls': ['https://github.com/KarypisLab/GKlib/archive'],
         'download_filename': 'METIS-v5.1.1-DistDGL-0.5.tar.gz',
         'filename': 'GKlib-METIS-v5.1.1-DistDGL-0.5.tar.gz',
-        'extract_cmd': "tar -C %(namelower)s-%(version)s/third_party/METIS/GKlib --strip-components=1 -xf %s",
+        'extract_cmd': "tar -C %(namelower)s-%(version)s/third_party/METIS/GKlib --strip-components=2 -xf %s",
     },
     {
         'filename': 'tensorpipe-20230206.tar.xz',
@@ -43,7 +43,29 @@ sources = [
         },
         'extract_cmd': "tar -C %(namelower)s-%(version)s/third_party/tensorpipe --strip-components=1 -xf %s",
     },
-    # DGL really needs cub >= 1.17, CUDA 11.3 only have 1.11
+    {
+        'filename': 'pcg-20220408.tar.xz',
+        'git_config': {
+            'url': 'https://github.com/imneme',
+            'repo_name': 'pcg-cpp',
+            'commit': '428802d1a5634f96bcd0705fab379ff0113bcf13',
+            'recursive': True,
+        },
+        'extract_cmd': "tar -C %(namelower)s-%(version)s/third_party/pcg --strip-components=1 -xf %s",
+    },
+    # requires commit 8009060 due to LIBXSMM_MELTW_FLAG_OPREDUCE_VECS_REDOP_MAX
+    #   -> do not use outdated dependency ver 1.17
+    {
+        'filename': 'libxsmm-20230504.tar.xz',
+        'git_config': {
+            'url': 'https://github.com/libxsmm',
+            'repo_name': 'libxsmm',
+            'commit': '80090603e43f6ddc870cc42e1403dd0af07744cc',
+            'recursive': True,
+        },
+        'extract_cmd': "tar -C %(namelower)s-%(version)s/third_party/libxsmm --strip-components=1 -xf %s",
+    },
+    # DGL really needs cub >= 1.17, CUDA 11.7 only have 1.15
     {
         'source_urls': ['https://github.com/NVIDIA/thrust/archive'],
         'download_filename': '1.17.0.tar.gz',
@@ -60,36 +82,40 @@ sources = [
 ]
 patches = [
     '%(name)s-%(version)s_use_externals_instead_of_submodules.patch',
+    '%(name)s-%(version)s_nanoflanSearchParams.patch'
 ]
 checksums = [
-    '8d26ebb7ed976665bbf5bbd1792d8e6efb13a8fa16e5eb1efed75e07fb982e04',  # dgl-0.9.1.tar.gz
-    'cedf0b32d32a8496bac7eb078b2b8260fb00ddb8d50c27e4082968a01bc33331',  # metis-5.1.1-DistDGL-v0.5.tar.gz
-    '52aa0d383d42360f4faa0ae9537ba2ca348eeab4db5f2dfd6343192d0ff4b833',  # GKlib-METIS-v5.1.1-DistDGL-0.5.tar.gz
-    '85203179f7970f4274a9f90452616ec534b1f54a87040fc98786d035efb429e4',  # tensorpipe-20230206.tar.xz
-    'b02aca5d2325e9128ed9d46785b8e72366f758b873b95001f905f22afcf31bbf',  # thrust-1.17.0.tar.gz
-    '16fd4860ae3196bc3eb08bf5754fa2a9697951ddae36dc9721e6614388893618',  # cub-1.17.0.tar.gz
-    # DGL-0.9.1_use_externals_instead_of_submodules.patch'
-    '4c39420abc09d619de92ca4ee5c4472359b1f5ffe8acd898d4d665bdbeb2ce40',
+    {'dgl-1.1.1.tar.gz': '076026a7818f2396056252269d7960c561840bb0fe89494e11f8d6c524891c49'},
+    {'metis-5.1.1-DistDGL-v0.5.tar.gz': 'cedf0b32d32a8496bac7eb078b2b8260fb00ddb8d50c27e4082968a01bc33331'},
+    {'GKlib-METIS-v5.1.1-DistDGL-0.5.tar.gz': '52aa0d383d42360f4faa0ae9537ba2ca348eeab4db5f2dfd6343192d0ff4b833'},
+    {'tensorpipe-20230206.tar.xz': '85203179f7970f4274a9f90452616ec534b1f54a87040fc98786d035efb429e4'},
+    {'pcg-20220408.tar.xz': 'c4633dd4ba3d8ca8170756905dc757464e6798407ebf8bee166226446fb86e60'},
+    {'libxsmm-20230504.tar.xz': '3539da609a4822b60a1dc84ba54f9872849ed46947274da50b6cd4bcc97b353e'},
+    {'thrust-1.17.0.tar.gz': 'b02aca5d2325e9128ed9d46785b8e72366f758b873b95001f905f22afcf31bbf'},
+    {'cub-1.17.0.tar.gz': '16fd4860ae3196bc3eb08bf5754fa2a9697951ddae36dc9721e6614388893618'},
+    {'DGL-1.1.1_use_externals_instead_of_submodules.patch':
+     'b98bb1b90ea5cb64e492fee3b2390eff4d9901a37c954c39326ce667a0ecdd8e'},
+    {'DGL-1.1.1_nanoflanSearchParams.patch': '82c1414116a95d4efaf99d4025d223af97a4fb87aac471e60a008618edae38bb'},
 ]
 
 builddependencies = [
-    ('CMake', '3.20.1'),
+    ('CMake', '3.24.3'),
     ('googletest', '1.11.0'),
 ]
 
 dependencies = [
-    ('Python', '3.9.5'),
-    ('SciPy-bundle', '2021.05'),
-    ('networkx', '2.5.1'),
-    ('tqdm', '4.61.2'),
-    ('DLPack', '0.3'),
+    ('Python', '3.10.4'),
+    ('SciPy-bundle', '2022.05'),
+    ('networkx', '2.8.4'),
+    ('tqdm', '4.64.0'),
+    ('DLPack', '0.8'),
     ('DMLC-Core', '0.5'),
-    ('Parallel-Hashmap', '1.33'),
-    ('nanoflann', '1.4.0'),
-    ('libxsmm', '1.16.2'),
-    ('CUDA', '11.3.1', '', SYSTEM),
-    ('NCCL', '2.10.3', versionsuffix),
-    ('PyTorch', '1.12.1', versionsuffix),
+    ('Parallel-Hashmap', '1.36'),
+    ('nanoflann', '1.5.0'),
+    # ('libxsmm', '1.17'),  # requires commit 8009060 due to LIBXSMM_MELTW_FLAG_OPREDUCE_VECS_REDOP_MAX
+    ('CUDA', '11.7.0', '', SYSTEM),
+    ('NCCL', '2.12.12', versionsuffix),
+    ('PyTorch', '1.12.0', versionsuffix),
 ]
 
 _copts = [
@@ -118,14 +144,10 @@ exts_default_options = {
 
 exts_list = [
     ('dgl', version, {
-        'source_urls': [GITHUB_LOWER_SOURCE],
-        'sources': {
-            'download_filename': '%(version)s.tar.gz',
-            'filename': SOURCELOWER_TAR_GZ,
-        },
+        'installopts': "--use-feature=in-tree-build ",
+        'source_tmpl': '%(namelower)s-%(version)s.tar.gz',
         'start_dir': 'python',
-        'installopts': '--use-feature=in-tree-build ',
-        'checksums': ['8d26ebb7ed976665bbf5bbd1792d8e6efb13a8fa16e5eb1efed75e07fb982e04'],
+        'checksums': ['076026a7818f2396056252269d7960c561840bb0fe89494e11f8d6c524891c49'],
     }),
 ]
 

@boegel
Copy link
Copy Markdown
Member

boegel commented Jun 16, 2025

Someone please change the PR title to be a bit more meaningful, this is a nightmare when composing release notes 😅

And link to previous PR by mentioning #18359 in PR description

@ThomasHoffmann77 ThomasHoffmann77 changed the title reopen #18359 {bio}[foss/2022a] RFdiffusion v1.1.0, DGL v1.1.1 w/ CUDA 11.7.0; reopen #18359 Jun 16, 2025
@ThomasHoffmann77 ThomasHoffmann77 changed the title {bio}[foss/2022a] RFdiffusion v1.1.0, DGL v1.1.1 w/ CUDA 11.7.0; reopen #18359 {bio}[foss/2022a] RFdiffusion v1.1.0, DGL v1.1.1 w/ CUDA 11.7.0 Jun 16, 2025
@ThomasHoffmann77 ThomasHoffmann77 requested a review from lexming June 16, 2025 16:19
fix METIS-v{6,5}.1.1-DistDGL-0.5.tar.gz typo
Copy link
Copy Markdown
Contributor

@lexming lexming left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In EasyBuild 5 we can know use checksums of git repos. Please consider the changes in ThomasHoffmann77#5

@ThomasHoffmann77
Copy link
Copy Markdown
Contributor Author

@lexming In #23135 I ported RFdiffusion 1.1.0 to foss/2023a using the merged DGL 1.1.3. If it works, we can close this PR.

@lexming
Copy link
Copy Markdown
Contributor

lexming commented Jun 19, 2025

As you wish, I'm happy to merge this PR once we have checksums for all sources in DGL. But if you prefer to move to 2023a that is also fine. When that other PR is ready, please close this one.

@ThomasHoffmann77 ThomasHoffmann77 requested a review from lexming June 20, 2025 17:05
@lexming
Copy link
Copy Markdown
Contributor

lexming commented Jun 20, 2025

@boegelbot please test @ jsc-zen3-a100

@boegelbot
Copy link
Copy Markdown
Collaborator

@lexming: Request for testing this PR well received on jsczen3l1.int.jsc-zen3.fz-juelich.de

PR test command 'if [[ develop != 'develop' ]]; then EB_BRANCH=develop ./easybuild_develop.sh 2> /dev/null 1>&2; EB_PREFIX=/home/boegelbot/easybuild/develop source init_env_easybuild_develop.sh; fi; EB_PR=23110 EB_ARGS= EB_CONTAINER= EB_REPO=easybuild-easyconfigs EB_BRANCH=develop /opt/software/slurm/bin/sbatch --job-name test_PR_23110 --ntasks=8 --partition=jsczen3g --gres=gpu:1 ~/boegelbot/eb_from_pr_upload_jsc-zen3.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 6943

Test results coming soon (I hope)...

Details

- notification for comment with ID 2992345972 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

@boegelbot
Copy link
Copy Markdown
Collaborator

Test report by @boegelbot
FAILED
Build succeeded for 2 out of 4 (2 easyconfigs in total)
jsczen3g1.int.jsc-zen3.fz-juelich.de - Linux Rocky Linux 9.5, x86_64, AMD EPYC-Milan Processor (zen3), 1 x NVIDIA NVIDIA A100 80GB PCIe, 555.42.06, Python 3.9.21
See https://gist.github.com/boegelbot/81717f68b528da3013c1199156d59832 for a full test report.

@lexming
Copy link
Copy Markdown
Contributor

lexming commented Jun 20, 2025

@boegelbot please test @ jsc-zen3-a100

@boegelbot
Copy link
Copy Markdown
Collaborator

@lexming: Request for testing this PR well received on jsczen3l1.int.jsc-zen3.fz-juelich.de

PR test command 'if [[ develop != 'develop' ]]; then EB_BRANCH=develop ./easybuild_develop.sh 2> /dev/null 1>&2; EB_PREFIX=/home/boegelbot/easybuild/develop source init_env_easybuild_develop.sh; fi; EB_PR=23110 EB_ARGS= EB_CONTAINER= EB_REPO=easybuild-easyconfigs EB_BRANCH=develop /opt/software/slurm/bin/sbatch --job-name test_PR_23110 --ntasks=8 --partition=jsczen3g --gres=gpu:1 ~/boegelbot/eb_from_pr_upload_jsc-zen3.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 6944

Test results coming soon (I hope)...

Details

- notification for comment with ID 2992540268 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

@boegelbot
Copy link
Copy Markdown
Collaborator

Test report by @boegelbot
FAILED
Build succeeded for 0 out of 2 (2 easyconfigs in total)
jsczen3g1.int.jsc-zen3.fz-juelich.de - Linux Rocky Linux 9.5, x86_64, AMD EPYC-Milan Processor (zen3), 1 x NVIDIA NVIDIA A100 80GB PCIe, 555.42.06, Python 3.9.21
See https://gist.github.com/boegelbot/4fde88a1b4e08e2593f3310d86f70dcb for a full test report.

@lexming
Copy link
Copy Markdown
Contributor

lexming commented Jun 21, 2025

@ThomasHoffmann77 Build failed with the following error in cmake:

CMake Error at CMakeLists.txt:199 (include):
  include could not find requested file:

    /tmp/boegelbot/DGL/1.1.1/foss-2022a-CUDA-11.7.0/dgl-1.1.1/third_party/METIS/GKlib/GKlibSystem.cmake

any idea why that file might be missing?

@ThomasHoffmann77
Copy link
Copy Markdown
Contributor Author

ThomasHoffmann77 commented Jun 23, 2025

@ThomasHoffmann77 Build failed with the following error in cmake:

CMake Error at CMakeLists.txt:199 (include):
  include could not find requested file:

    /tmp/boegelbot/DGL/1.1.1/foss-2022a-CUDA-11.7.0/dgl-1.1.1/third_party/METIS/GKlib/GKlibSystem.cmake

any idea why that file might be missing?

not yet.
I'll try to update RFdiffusion's dependecy from DGL 1.1.1 to 1.1.3 using PR #20092

@github-actions github-actions bot removed the update label Jun 23, 2025
update dependency DGL 1.1.1->1.1.3; requires update of dep. PyTorch 1.12.0 -> 1.13.1
@ThomasHoffmann77 ThomasHoffmann77 changed the title {bio}[foss/2022a] RFdiffusion v1.1.0, DGL v1.1.1 w/ CUDA 11.7.0 {bio}[foss/2022a] RFdiffusion v1.1.0 w/ CUDA 11.7.0 Jun 23, 2025
@ThomasHoffmann77
Copy link
Copy Markdown
Contributor Author

Test report by @ThomasHoffmann77
FAILED
Build succeeded for 7 out of 11 (2 easyconfigs in total)
srv-mahamid-01.embl.de - Linux AlmaLinux 8.10, x86_64, AMD EPYC 7513 32-Core Processor, 2 x NVIDIA NVIDIA GeForce RTX 3090, 550.120, Python 3.6.8
See https://gist.github.com/ThomasHoffmann77/9b66ae0b2e62a2918b44d196a1141703 for a full test report.

@ThomasHoffmann77
Copy link
Copy Markdown
Contributor Author

Test report by @ThomasHoffmann77
SUCCESS
Build succeeded for 3 out of 3 (2 easyconfigs in total)
srv-mahamid-01.embl.de - Linux AlmaLinux 8.10, x86_64, AMD EPYC 7513 32-Core Processor, 2 x NVIDIA NVIDIA GeForce RTX 3090, 550.120, Python 3.6.8
See https://gist.github.com/ThomasHoffmann77/0506471bba8ccd3445f9100ea2133e3e for a full test report.

('e3nn', '0.3.3', versionsuffix),
('wandb', '0.13.4'),
('CUDA', '11.7.0', '', SYSTEM),
('PyTorch-bundle', '1.13.1', versionsuffix), # same as used in DGL
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tests are failing because the main version of PyTorch in 2022a is 1.12.0. For instance, the non-CUDA version of RFdiffusion (RFdiffusion-1.1.0-foss-2022a.eb) already depends on v1.12.0.
To avoid conflicts, this easyconfig should also use v1.12.0, or add a versionsuffix in the form of PyTorch-1.13.1.

@Thyre Thyre added the 2022a label Aug 18, 2025
@boegel
Copy link
Copy Markdown
Member

boegel commented Oct 13, 2025

No longer relevant since 2022a toolchains are deprecated since foss/2025b was defined, see also https://docs.easybuild.io/policies/toolchains/, so closing...

@boegel boegel closed this Oct 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants