Skip to content

[Reland2] Update NVTX to NVTX3#109843

Closed
cyyever wants to merge 13 commits into
pytorch:mainfrom
cyyever:nvtx3_fix2
Closed

[Reland2] Update NVTX to NVTX3#109843
cyyever wants to merge 13 commits into
pytorch:mainfrom
cyyever:nvtx3_fix2

Conversation

@cyyever
Copy link
Copy Markdown
Collaborator

@cyyever cyyever commented Sep 22, 2023

Another attempt to update NVTX to NVTX3. We now avoid changing NVTX header inclusion of existing code. The advantage of NVTX3 over NVTX is that it is a header-only library so that linking with NVTX3 can greatly simplify our CMake and other building scripts for finding libraries in user environments. In addition, NVTX are indeed still present in the latest CUDA versions, but they're no longer a compiled library: It's now a header-only library. That's why there isn't a .lib file anymore.

cc @malfet @seemethere @izaitsevfb @peterbell10

@pytorch-bot
Copy link
Copy Markdown

pytorch-bot Bot commented Sep 22, 2023

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/109843

Note: Links to docs will display an error until the docs builds have been completed.

❌ 6 New Failures, 3 Unrelated Failures

As of commit 8b303bd with merge base 0a25666 (image):

NEW FAILURES - The following jobs have failed:

FLAKY - The following job failed but was likely due to flakiness present on trunk:

UNSTABLE - The following jobs failed but were likely due to flakiness present on trunk and has been marked as unstable:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@cyyever cyyever marked this pull request as draft September 22, 2023 00:47
@cyyever cyyever force-pushed the nvtx3_fix2 branch 4 times, most recently from b2eaaf7 to d260ed6 Compare September 22, 2023 00:57
@cyyever cyyever changed the title Nvtx3 fix2 [Reland2] Upgrade NVTX to NVTX3 Sep 22, 2023
@cyyever
Copy link
Copy Markdown
Collaborator Author

cyyever commented Sep 22, 2023

@izaitsevfb This PR combines your changes in #107497 and more cleanups. And I chose to include 'nvtx3/XXX' to avoid confusion and linking issues between old and newer versions of NVTX.

@cyyever cyyever changed the title [Reland2] Upgrade NVTX to NVTX3 [Reland2] Update NVTX to NVTX3 Sep 22, 2023
@cyyever
Copy link
Copy Markdown
Collaborator Author

cyyever commented Sep 22, 2023

@pytorchbot label "topic: not user facing"

@pytorch-bot pytorch-bot Bot added the topic: not user facing topic category label Sep 22, 2023
@cyyever cyyever marked this pull request as ready for review September 22, 2023 01:05
@cyyever cyyever force-pushed the nvtx3_fix2 branch 2 times, most recently from f9d1d49 to eebcc4b Compare September 22, 2023 01:12
@cyyever
Copy link
Copy Markdown
Collaborator Author

cyyever commented Sep 22, 2023

@pytorchbot label ciflow/binaries

@pytorch-bot pytorch-bot Bot added the ciflow/binaries Trigger all binary build and upload jobs on the PR label Sep 22, 2023
@cyyever cyyever marked this pull request as draft September 22, 2023 01:57
@cyyever cyyever force-pushed the nvtx3_fix2 branch 4 times, most recently from 65f5805 to 52c1190 Compare September 22, 2023 08:25
@cyyever
Copy link
Copy Markdown
Collaborator Author

cyyever commented Sep 22, 2023

Windows test hosts must install NVTX3 before we can continue. I proposed a PR at pytorch/builder#1547

@cyyever cyyever marked this pull request as ready for review September 22, 2023 14:23
@cyyever
Copy link
Copy Markdown
Collaborator Author

cyyever commented Sep 22, 2023

@pytorchbot rebase

@facebook-github-bot
Copy link
Copy Markdown
Contributor

@izaitsevfb has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@izaitsevfb
Copy link
Copy Markdown
Contributor

@izaitsevfb has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Looks like the testing is passing internally in the current form!

@cyyever
Copy link
Copy Markdown
Collaborator Author

cyyever commented Jul 19, 2024

@izaitsevfb Good news..

@cyyever
Copy link
Copy Markdown
Collaborator Author

cyyever commented Jul 20, 2024

@izaitsevfb Should I merge it?

@izaitsevfb
Copy link
Copy Markdown
Contributor

@izaitsevfb Should I merge it?

Sorry, I ran a more comprehensive set of builds, and the issue is still there:

xplat/caffe2/torch/csrc/profiler/stubs/cuda.cpp:3:10: fatal error: 'nvtx3/nvToolsExt.h' file not found
#include <nvtx3/nvToolsExt.h>
         ^~~~~~~~~~~~~~~~~~~~

is the old nvtx still used? Why can't we just update it and use the include without the prefix as it was before (#include <nvToolsExt.h>)?

@cyyever
Copy link
Copy Markdown
Collaborator Author

cyyever commented Jul 25, 2024

@izaitsevfb Should I merge it?

Sorry, I ran a more comprehensive set of builds, and the issue is still there:

xplat/caffe2/torch/csrc/profiler/stubs/cuda.cpp:3:10: fatal error: 'nvtx3/nvToolsExt.h' file not found
#include <nvtx3/nvToolsExt.h>
         ^~~~~~~~~~~~~~~~~~~~

is the old nvtx still used? Why can't we just update it and use the include without the prefix as it was before (#include <nvToolsExt.h>)?

Now it switches to old nvtx when necessary.

@facebook-github-bot
Copy link
Copy Markdown
Contributor

@izaitsevfb has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Comment thread torch/utils/hipify/cuda_to_hip_mappings.py Outdated
@cyyever
Copy link
Copy Markdown
Collaborator Author

cyyever commented Jul 26, 2024

@pytorchbot rebase

@pytorchmergebot
Copy link
Copy Markdown
Collaborator

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

@pytorchmergebot
Copy link
Copy Markdown
Collaborator

Successfully rebased nvtx3_fix2 onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout nvtx3_fix2 && git pull --rebase)

@facebook-github-bot
Copy link
Copy Markdown
Contributor

@izaitsevfb has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@izaitsevfb
Copy link
Copy Markdown
Contributor

@izaitsevfb has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

The latest version passes internal tests.

@cyyever
Copy link
Copy Markdown
Collaborator Author

cyyever commented Aug 20, 2024

@pytorchmergebot merge -f "Rocm 6.0 was removed"

@pytorchmergebot
Copy link
Copy Markdown
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/binaries Trigger all binary build and upload jobs on the PR ciflow/periodic Trigger jobs ran periodically on master (periodic.yml) on the PR ciflow/trunk Trigger trunk jobs on your pull request Merged module: bazel module: build Build system issues open source Reverted topic: not user facing topic category triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Projects

None yet

Development

Successfully merging this pull request may close these issues.