Skip to content

Conversation

@PeixuanZuo
Copy link
Collaborator

Description: Describe your changes.
Update performance, add rocm part of details.
Motivation and Context

  • Why is this change required? What problem does it solve?
  • If it fixes an open issue, please link to the issue here.

@PeixuanZuo PeixuanZuo requested a review from ytaous September 13, 2022 10:14
* Load the generated JSON file

To profile CUDA kernels, please add the cupti library to your PATH and use the onnxruntime binary built from source with `--enable_cuda_profiling`. Performance numbers from the device will then be attached to those from the host. For example:
To profile CUDA or ROCm kernels, please add the cupti library to your PATH and use the onnxruntime binary built from source with `--enable_cuda_profiling` or `--enable_rocm_profiling`. Performance numbers from the device will then be attached to those from the host. For example:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cupti

is this valid for ROCm?

@PeixuanZuo PeixuanZuo merged commit a305bcb into rocm-ep Sep 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants