Conversation
6629997 to
ca5dbb0
Compare
05759db to
1170705
Compare
8db2357 to
1558c61
Compare
|
Pretty surprised how well this bump is going, successfully built everything already. RCCL link time has gotten worse; it's taking hours and using >64G of RAM :c |
https://github.com/GZGavinZhao/rocm-systems/commits/solus-rocm-7.1.x/ ? |
|
! I forgot you already had that ready thanks so much, will swap that in and then burn another overnight big rebuild lol. |
45164cb to
d81ece2
Compare
|
Rebuild after applying the ISA compat patches went well too. |
|
Is this expected to fix the The logs generally look like this: To be clear, I do not have this patch installed. |
|
No, as far as I know that's a linux-firmware issue and not related to the ROCm version. #472914 was expected to resolve it (but I guess doesn't work since that should've reached you if you updated unstable recently) You might want to try older kernel and older firmware, and if you find a combination that works you could post that info on the amdgpu issue you linked on freedesktop. I think ROCm/ROCm#5844 is tracking the same thing you're seeing. |
Flakebi
left a comment
There was a problem hiding this comment.
Didn’t build it but the package changes look good to me, thanks!
d81ece2 to
c4f2ce6
Compare
c4f2ce6 to
ab49426
Compare
| # Vendored upstream PR for fix for segfault when queue allocation fails | ||
| # https://github.com/ROCm/rocm-systems/pull/2850 | ||
| ./queue-failure.patch |
…on fails Found by @06kellyjac, possibly triggered by bug in linux-6.18.4 https://lore.kernel.org/linux-iommu/870872aa-28e9-412a-bac6-8020bf560e4f@amd.com/t/ Resolving this *will not fix the underlying issue*, but does transmute a segfault into an actual error result. ROCm error: out of memory current device: 0, in function stream at /build/source/ggml/src/ggml-hip/../ggml-cuda/common.cuh:1345 hipStreamCreateWithFlags(&streams[device][stream], 0x01)
ab49426 to
6713c5e
Compare
| # Fix error: redefinition of 'struct drm_color_ctm_3x4' | ||
| # https://github.com/ROCm/amdsmi/pull/165 | ||
| ./drm-struct-redefinition-fix.patch |
|
Rebased, fixed a conflict, fixed a new issue with amdsmi. Planning to merge later today if this review run goes well. |
|
|
@elis1000 I think the following table might clarify a few solutions: However, 6 months ago, I tried to upgrade ROCm packages I experienced many issues that I had no idea how to solve. Thus I am thankful for the work done by @LunNova , and I will patiently wait for the version 7.2 to becomes stable. |
|
@LunNova is it just me or is |
|
Makes sense; any chance patching in this old draft PR of yours in might fix the sporadic OOMing / build failures? |
|
If I'm understanding correctly, a large part of these changes are already applied for ROCm 7.2. |

Closes #462424.
nixpkgs-reviewresultGenerated using
nixpkgs-review.Command:
nixpkgs-review pr 481349Commit:
d81ece2a6fa5d30929ecfe7aba631b1f06520031x86_64-linux✅ 113 packages built:
Things done
passthru.tests.nixpkgs-reviewon this PR. See nixpkgs-review usage../result/bin/.Add a 👍 reaction to pull requests you find important.