Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FEAT:at least one of ROCM_HOME or CUDA_HOME must be None. #1809

Closed
wants to merge 1 commit into from

Conversation

hj-wei
Copy link

@hj-wei hj-wei commented Dec 30, 2024

Hi all, I manually generating nvcc to bypass NVIDIA component checks(Megatron-LM),
see https://github.com/NVIDIA/Megatron-LM/blob/2da43ef4c1b9e76f03b7567360cf7390e877f1b6/megatron/legacy/fused_kernels/__init__.py#L57

but it can lead to incorrect CUDA_HOME configurations. This can cause initialization anomalies in downstream libraries like DeepSpeed

@rocm-repo-management-api
Copy link

rocm-repo-management-api bot commented Dec 30, 2024

Jenkins build for cb064aefc88dae25e9f1e54eabb29ad83f23aeca commit finished as FAILURE
Links: Blue Ocean view / Build artifacts

@rocm-repo-management-api
Copy link

rocm-repo-management-api bot commented Jan 2, 2025

Jenkins build for cb064aefc88dae25e9f1e54eabb29ad83f23aeca commit finished as FAILURE
Links: Blue Ocean view / Build artifacts

@jithunnair-amd jithunnair-amd changed the base branch from main to release/2.5 January 6, 2025 16:14
@jithunnair-amd jithunnair-amd changed the base branch from release/2.5 to main January 6, 2025 16:20
jithunnair-amd added a commit that referenced this pull request Jan 6, 2025
This PR is a release/2.5-based version of
#1809

Copied description by @hj-wei from
#1809

> Hi all, I manually generating nvcc to bypass NVIDIA component
checks(Megatron-LM),
see
https://github.com/NVIDIA/Megatron-LM/blob/2da43ef4c1b9e76f03b7567360cf7390e877f1b6/megatron/legacy/fused_kernels/__init__.py#L57

> but it can lead to incorrect CUDA_HOME configurations. This can cause
initialization anomalies in downstream libraries like DeepSpeed
@jithunnair-amd
Copy link
Collaborator

@hj-wei Sorry, we do not merge PRs into main branch in our ROCm fork, so as to keep it an exact replica of upstream. I have added this change to our release/2.5 branch: #1814, but I'd request you to file this PR on upstream (pytorch/pytorch) main, since it would allow community users to also benefit from this change.

rocm-mici pushed a commit that referenced this pull request Jan 6, 2025
This PR is a release/2.5-based version of
#1809

Copied description by @hj-wei from
#1809

> Hi all, I manually generating nvcc to bypass NVIDIA component
checks(Megatron-LM),
see
https://github.com/NVIDIA/Megatron-LM/blob/2da43ef4c1b9e76f03b7567360cf7390e877f1b6/megatron/legacy/fused_kernels/__init__.py#L57

> but it can lead to incorrect CUDA_HOME configurations. This can cause
initialization anomalies in downstream libraries like DeepSpeed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants