Skip to content

rocmPackages.hipblaslt: massively reduce peak disk space usage#451188

Merged
JohnRTitor merged 4 commits intoNixOS:masterfrom
LunNova:push-nmzswnymunon
Oct 16, 2025
Merged

rocmPackages.hipblaslt: massively reduce peak disk space usage#451188
JohnRTitor merged 4 commits intoNixOS:masterfrom
LunNova:push-nmzswnymunon

Conversation

@LunNova
Copy link
Member

@LunNova LunNova commented Oct 12, 2025

Things done

  • Built on platform:
    • x86_64-linux
    • aarch64-linux
    • x86_64-darwin
    • aarch64-darwin
  • Tested, as applicable:
  • Ran nixpkgs-review on this PR. See nixpkgs-review usage.
  • Tested basic functionality of all binary files, usually in ./result/bin/.
  • Nixpkgs Release Notes
    • Package update: when the change is major or breaking.
  • NixOS Release Notes
    • Module addition: when adding a new NixOS module.
    • Module update: when the change is significant.
  • Fits CONTRIBUTING.md, pkgs/README.md, maintainers/README.md and other READMEs.

Add a 👍 reaction to pull requests you find important.

@nixpkgs-ci nixpkgs-ci bot added 10.rebuild-linux: 11-100 This PR causes between 11 and 100 packages to rebuild on Linux. 10.rebuild-darwin: 0 This PR does not cause any packages to rebuild on Darwin. 11.by: package-maintainer This PR was created by a maintainer of all the package it changes. 6.topic: rocm ROCm is an Advanced Micro Devices software stack for graphics processing unit programming. labels Oct 12, 2025
without this zstd compression of msgpack .dats silently failed
let's avoid regressing compression in future, oops
@LunNova LunNova marked this pull request as ready for review October 12, 2025 01:29
@LunNova LunNova requested a review from 06kellyjac October 12, 2025 01:29
@LunNova LunNova marked this pull request as draft October 12, 2025 02:17
@LunNova
Copy link
Member Author

LunNova commented Oct 12, 2025

Marking draft while investigating ROCm/rocm-libraries#2073 (comment)

Unsure if it's a pre-existing issue I just discovered or I broke something with this patch.

@LunNova LunNova force-pushed the push-nmzswnymunon branch 3 times, most recently from 539440c to 4374426 Compare October 12, 2025 04:17
@LunNova LunNova requested a review from JohnRTitor October 12, 2025 04:23
@LunNova LunNova marked this pull request as ready for review October 12, 2025 23:51
@LunNova
Copy link
Member Author

LunNova commented Oct 13, 2025

nixpkgs-review result

Generated using nixpkgs-review.

Command: nixpkgs-review pr 451188
Commit: 4374426f0ee8eed7d41c68423d511d68d97bfb86


x86_64-linux

❌ 1 package failed to build:
✅ 24 packages built:
  • llama-cpp-rocm
  • magma (magma-hip)
  • magma.test (magma-hip.test)
  • ollama-rocm
  • python312Packages.torchWithRocm
  • python312Packages.torchWithRocm.cxxdev
  • python312Packages.torchWithRocm.dev
  • python312Packages.torchWithRocm.dist
  • python312Packages.torchWithRocm.lib
  • python313Packages.torchWithRocm
  • python313Packages.torchWithRocm.cxxdev
  • python313Packages.torchWithRocm.dev
  • python313Packages.torchWithRocm.dist
  • python313Packages.torchWithRocm.lib
  • rocmPackages.hipblas
  • rocmPackages.hipblaslt
  • rocmPackages.hipblaslt.benchmark
  • rocmPackages.hipsolver
  • rocmPackages.migraphx
  • rocmPackages.miopen (rocmPackages.miopen-hip)
  • rocmPackages.rocalution
  • rocmPackages.rocblas
  • rocmPackages.rocsolver
  • zluda

Copy link
Member

@Flakebi Flakebi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn’t look at the patch in detail, but it mostly seemed to make sense, so LGTM.
Thanks for reducing this even more!

@nixpkgs-ci nixpkgs-ci bot added 12.approvals: 1 This PR was reviewed and approved by one person. 12.approved-by: package-maintainer This PR was reviewed and approved by a maintainer listed in any of the changed packages. labels Oct 13, 2025
Copy link
Contributor

@Kitt3120 Kitt3120 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a maintainer, but this looks good to me, too. Well done!

@nixpkgs-ci nixpkgs-ci bot added 12.approvals: 2 This PR was reviewed and approved by two persons. and removed 12.approvals: 1 This PR was reviewed and approved by one person. labels Oct 13, 2025
@LunNova
Copy link
Member Author

LunNova commented Oct 14, 2025

I've been working on much more significant rework in ROCm/rocm-libraries#2073 Some of it is not yet ready, but I've added the items that are low risk to this patch set for further resources reduction.

See the table in ROCm/rocm-libraries#2073 for commit by commit resource usage improvements.

We should probably land this soon since the previous attempt was extremely marginal for whether it can succeed on hydra.

…usage

Peak build dir usage is now 25GB

Partially applies [hipblaslt] Refactor Parallel.py to drop joblib, decimate resource usage
no longer needed now we unlink .s / .o files as soon as possible
@06kellyjac
Copy link
Member

06kellyjac commented Oct 15, 2025

nixpkgs-review result

Generated using nixpkgs-review.

Command: nixpkgs-review pr 451188
Commit: bfd0d13dc984b8f6fcedccd2c26f1a2c7b5e3261


x86_64-linux

❌ 6 packages failed to build:
  • python312Packages.torchWithRocm
  • python312Packages.torchWithRocm.cxxdev
  • python312Packages.torchWithRocm.dev
  • python312Packages.torchWithRocm.dist
  • python312Packages.torchWithRocm.lib
  • rocmPackages.mivisionx (rocmPackages.mivisionx-hip)
✅ 19 packages built:
  • llama-cpp-rocm
  • magma (magma-hip)
  • magma.test (magma-hip.test)
  • ollama-rocm
  • python313Packages.torchWithRocm
  • python313Packages.torchWithRocm.cxxdev
  • python313Packages.torchWithRocm.dev
  • python313Packages.torchWithRocm.dist
  • python313Packages.torchWithRocm.lib
  • rocmPackages.hipblas
  • rocmPackages.hipblaslt
  • rocmPackages.hipblaslt.benchmark
  • rocmPackages.hipsolver
  • rocmPackages.migraphx
  • rocmPackages.miopen (rocmPackages.miopen-hip)
  • rocmPackages.rocalution
  • rocmPackages.rocblas
  • rocmPackages.rocsolver
  • zluda

Error logs: `x86_64-linux`
python312Packages.torchWithRocm
In file included from /build/pytorch/c10/core/TensorOptions.h:4:
In file included from /build/pytorch/c10/core/DefaultDtype.h:3:
In file included from /build/pytorch/c10/core/ScalarType.h:13:
In file included from /build/pytorch/c10/util/complex.h:9:
In file included from /nix/store/ivbqhyb8rrv1kxvh3shqsvphjl8y9y0h-rocthrust-6.4.3/include/thrust/complex.h:26:
In file included from /nix/store/ivbqhyb8rrv1kxvh3shqsvphjl8y9y0h-rocthrust-6.4.3/include/thrust/detail/type_traits.h:29:
In file included from /nix/store/zyzxcqvh5bnra2ya10xzc7bhwnl86s2f-rocprim-6.4.3/include/rocprim/type_traits.hpp:27:
In file included from /nix/store/zyzxcqvh5bnra2ya10xzc7bhwnl86s2f-rocprim-6.4.3/include/rocprim/type_traits_interface.hpp:24:
/nix/store/zyzxcqvh5bnra2ya10xzc7bhwnl86s2f-rocprim-6.4.3/include/rocprim/types.hpp:82:7: warning: macro '__AMDGCN_WAVEFRONT_SIZE' has been marked as deprecated: compile-time-constant access to the wavefront size will be removed in a future release [-Wdeprecated-pragma]
   82 | #elif ROCPRIM_WAVEFRONT_SIZE == 64
      |       ^
/nix/store/zyzxcqvh5bnra2ya10xzc7bhwnl86s2f-rocprim-6.4.3/include/rocprim/config.hpp:244:36: note: expanded from macro 'ROCPRIM_WAVEFRONT_SIZE'
  244 |     #define ROCPRIM_WAVEFRONT_SIZE __AMDGCN_WAVEFRONT_SIZE
      |                                    ^
<built-in>:429:139: note: macro marked 'deprecated' here
  429 | #pragma clang deprecated(__AMDGCN_WAVEFRONT_SIZE, "compile-time-constant access to the wavefront size will be removed in a future release")
      |                                                                                                                                           ^
5 warnings generated when compiling for host.
Job completed: /build/pytorch/build/caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/hip/./torch_hip_generated_ReduceSumProdKernel.hip.o
[7124/7530] Building HIPCC object caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/hip/bgemm_kernels/torch_hip_generated_bgemm_kernel_bf16bf16bf16_256_16x256x64_16x16_1x4_8x16x1_8x16x1_1x16x1x16_4_Intrawave_v2.hip.o

I killed python312Packages.torchWithRocm because I didn't see much value building it and python313Packages.torchWithRocm and by this point I was tired of the review running :P

@nixpkgs-ci nixpkgs-ci bot added 12.approvals: 3+ This PR was reviewed and approved by three or more persons. and removed 12.approvals: 2 This PR was reviewed and approved by two persons. labels Oct 15, 2025
@@ -86,6 +86,8 @@ stdenv.mkDerivation (finalAttrs: {
env.ROCM_PATH = "${clr}";
env.TENSILE_ROCM_ASSEMBLER_PATH = lib.getExe' clr "amdclang++";
env.TENSILE_GEN_ASSEMBLY_TOOLCHAIN = lib.getExe' clr "amdclang++";
env.LD_PRELOAD = "${jemalloc}/lib/libjemalloc.so";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we patchElf this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well it's just for the duration of the build AFAICT. If it's to affect the existing tooling in the store we'd either need to apply it always, build a second copy that has it patchelf-ed, or a wrapper which sets LD_PRELOAD.

IMO a comment explaining the benefits and maybe using lib.getLib would be sufficient tweaks

Suggested change
env.LD_PRELOAD = "${jemalloc}/lib/libjemalloc.so";
# has around x improvement (or y benefit) when running z
env.LD_PRELOAD = "${lib.getLib jemalloc}/lib/libjemalloc.so";

I'd say stdenv.hostPlatform.extensions.sharedLibrary would be unnecessary being linux only but could also be included

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah this is for the allocation hungry build time TensileCreateLibrary processes, not relevant at runtime.

Without jemalloc:

Peak memory usage (MB): 31,011.9
Current memory usage (MB): 28,199.3

With jemalloc:

Peak memory usage (MB): 30,893.7
Current memory usage (MB): 22,695.9

@JohnRTitor JohnRTitor added this pull request to the merge queue Oct 16, 2025
Merged via the queue into NixOS:master with commit 1b3f5cf Oct 16, 2025
27 checks passed
@LunNova LunNova deleted the push-nmzswnymunon branch October 16, 2025 17:16
PedroHLC pushed a commit to PedroHLC/nixpkgs that referenced this pull request Oct 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

6.topic: rocm ROCm is an Advanced Micro Devices software stack for graphics processing unit programming. 10.rebuild-darwin: 0 This PR does not cause any packages to rebuild on Darwin. 10.rebuild-linux: 11-100 This PR causes between 11 and 100 packages to rebuild on Linux. 11.by: package-maintainer This PR was created by a maintainer of all the package it changes. 12.approvals: 3+ This PR was reviewed and approved by three or more persons. 12.approved-by: package-maintainer This PR was reviewed and approved by a maintainer listed in any of the changed packages.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants