FEAT:at least one of ROCM_HOME or CUDA_HOME must be None. #1809

hj-wei · 2024-12-30T02:22:42Z

Hi all, I manually generating nvcc to bypass NVIDIA component checks(Megatron-LM),
see https://github.com/NVIDIA/Megatron-LM/blob/2da43ef4c1b9e76f03b7567360cf7390e877f1b6/megatron/legacy/fused_kernels/__init__.py#L57

but it can lead to incorrect CUDA_HOME configurations. This can cause initialization anomalies in downstream libraries like DeepSpeed

rocm-repo-management-api · 2024-12-30T02:25:48Z

Jenkins build for cb064aefc88dae25e9f1e54eabb29ad83f23aeca commit finished as FAILURE
Links: Blue Ocean view / Build artifacts

rocm-repo-management-api · 2025-01-02T20:25:40Z

Jenkins build for cb064aefc88dae25e9f1e54eabb29ad83f23aeca commit finished as FAILURE
Links: Blue Ocean view / Build artifacts

@hj-wei

This PR is a release/2.5-based version of #1809 Copied description by @hj-wei from #1809 > Hi all, I manually generating nvcc to bypass NVIDIA component checks(Megatron-LM), see https://github.com/NVIDIA/Megatron-LM/blob/2da43ef4c1b9e76f03b7567360cf7390e877f1b6/megatron/legacy/fused_kernels/__init__.py#L57 > but it can lead to incorrect CUDA_HOME configurations. This can cause initialization anomalies in downstream libraries like DeepSpeed

jithunnair-amd · 2025-01-06T16:30:40Z

@hj-wei Sorry, we do not merge PRs into main branch in our ROCm fork, so as to keep it an exact replica of upstream. I have added this change to our release/2.5 branch: #1814, but I'd request you to file this PR on upstream (pytorch/pytorch) main, since it would allow community users to also benefit from this change.

@hj-wei

This PR is a release/2.5-based version of #1809 Copied description by @hj-wei from #1809 > Hi all, I manually generating nvcc to bypass NVIDIA component checks(Megatron-LM), see https://github.com/NVIDIA/Megatron-LM/blob/2da43ef4c1b9e76f03b7567360cf7390e877f1b6/megatron/legacy/fused_kernels/__init__.py#L57 > but it can lead to incorrect CUDA_HOME configurations. This can cause initialization anomalies in downstream libraries like DeepSpeed

FEAT:at least one of ROCM_HOME or CUDA_HOME must be None.

cb064ae

jithunnair-amd changed the base branch from main to release/2.5 January 6, 2025 16:14

jithunnair-amd requested review from jeffdaily, jithunnair-amd, jataylo and pruthvistony as code owners January 6, 2025 16:14

jithunnair-amd changed the base branch from release/2.5 to main January 6, 2025 16:20

jithunnair-amd mentioned this pull request Jan 6, 2025

[release/2.5] FEAT:at least one of ROCM_HOME or CUDA_HOME must be None. #1814

Merged

jithunnair-amd closed this Jan 6, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FEAT:at least one of ROCM_HOME or CUDA_HOME must be None. #1809

FEAT:at least one of ROCM_HOME or CUDA_HOME must be None. #1809

hj-wei commented Dec 30, 2024

rocm-repo-management-api bot commented Dec 30, 2024 •

edited

Loading

rocm-repo-management-api bot commented Jan 2, 2025 •

edited

Loading

jithunnair-amd commented Jan 6, 2025

FEAT:at least one of ROCM_HOME or CUDA_HOME must be None. #1809

FEAT:at least one of ROCM_HOME or CUDA_HOME must be None. #1809

Conversation

hj-wei commented Dec 30, 2024

rocm-repo-management-api bot commented Dec 30, 2024 • edited Loading

rocm-repo-management-api bot commented Jan 2, 2025 • edited Loading

jithunnair-amd commented Jan 6, 2025

rocm-repo-management-api bot commented Dec 30, 2024 •

edited

Loading

rocm-repo-management-api bot commented Jan 2, 2025 •

edited

Loading