Skip to content

[PROTON] Migrate Proton ROCm backend from roctracer to rocprofiler-sdk#9704

Merged
Jokeren merged 60 commits into
triton-lang:mainfrom
ZelboK:feat/rocprofiler_sdk_late_start
May 14, 2026
Merged

[PROTON] Migrate Proton ROCm backend from roctracer to rocprofiler-sdk#9704
Jokeren merged 60 commits into
triton-lang:mainfrom
ZelboK:feat/rocprofiler_sdk_late_start

Conversation

@ZelboK
Copy link
Copy Markdown
Contributor

@ZelboK ZelboK commented Mar 12, 2026

Does not need LD_PRELOAD. Intializes rocprofsdk on the triton profiler import which does a lightweight intialization one time cost which replaces hsa_queue_create_fn pointer in the HSA table and creates & registers our 2 SDK contexts. Must happen before any GPU operation creates an HSA queue, because the SDK can only intercept queues created after the replacement. Queues that already exist are plain HSA queues and invisible to the profiler.

context 1: codeObjectContext used for kernel registration names
context 2: profilingContext does the heavy work which is started only after doStart(0 and doStop().

between proton start() and end(0 the SDK WriteInterceptor intercepts the dispatches inbetween with barrier packets.

Rewrote _select_backend() to avoid calling get_current_target(), which triggers HIP runtime init before force_configure can run.

Tested locally by building rocm-systems from main with TRITON_ROCPROFILER_SDK.. env's.

Replace the deprecated roctracer-based profiling backend with a new
implementation built on rocprofiler-sdk, using late-start via
rocprofiler_force_configure so no LD_PRELOAD or tool-library preloading
is required.

Key changes:

- Add RocprofSDKProfiler with a two-context architecture:
  * codeObjectContext (always active): lightweight callback for
    kernel_id -> name registration as code objects are loaded.
  * profilingContext (on-demand): HIP runtime API callback tracing
    and buffer-based kernel dispatch tracing, started in doStart()
    and stopped in doStop() to match Proton's start/stop idiomatics.

- Eagerly call force_configure at  time on AMD
  so interception hooks are installed before any HSA queues are created.
  Both contexts are registered at this point, causing the SDK to install
  queue hooks. Only the lightweight codeObjectContext is activated
  immediately.

- Rewrite _select_backend() to infer the backend from the registered
  backends dict rather than calling get_current_target(), which would
  trigger HIP runtime init before force_configure can run.

- Wire up ROCTx marker tracing via libroctx64's native callback API
  (roctxRegisterTracerCallback) since rocprofiler-sdk's marker service
  requires its replacement roctx library, unavailable with late-start.

- Add RocprofApi dispatch layer (ExternLibRocprofiler) for runtime
  dlopen/dlsym of librocprofiler-sdk.so, with optional path override
  via TRITON_ROCPROFILER_SDK_LIB_PATH.

- Update CMake to discover rocprofiler-sdk headers and plumb
  ROCPROFILER_SDK_INCLUDE_DIR into the build.
@ZelboK ZelboK marked this pull request as draft March 12, 2026 19:02
ZelboK added 3 commits March 12, 2026 19:14
…), getKernelName Fix using shared lock instead of two lock acquis, simplified no correlation path, missing capture counting api. chagnes to see if nvidia CI runner works
@Jokeren
Copy link
Copy Markdown
Contributor

Jokeren commented Mar 13, 2026

@ZelboK Feel free to let me know if it's ready for review!

Comment thread third_party/proton/test/test_api.py
@ZelboK ZelboK marked this pull request as ready for review March 13, 2026 16:52
@ZelboK
Copy link
Copy Markdown
Contributor Author

ZelboK commented Mar 13, 2026

@ZelboK Feel free to let me know if it's ready for review!

Feel free to review :)

Comment thread third_party/proton/proton/profile.py Outdated
Comment thread third_party/proton/csrc/lib/Driver/CMakeLists.txt Outdated
Comment thread third_party/proton/csrc/lib/Profiler/RocprofSDK/RocprofSDKProfilerStub.cpp Outdated
Comment thread third_party/proton/proton/__init__.py Outdated
Comment thread third_party/proton/csrc/Proton.cpp Outdated
Comment thread third_party/proton/csrc/lib/Profiler/RocprofSDK/RocprofSDKProfiler.cpp Outdated
Comment thread third_party/proton/csrc/lib/Profiler/RocprofSDK/RocprofSDKProfiler.cpp Outdated
Comment thread third_party/proton/csrc/lib/Profiler/RocprofSDK/RocprofSDKProfiler.cpp Outdated
@Jokeren Jokeren changed the title Migrate Proton ROCm backend from roctracer to rocprofiler-sdk [PROTON] Migrate Proton ROCm backend from roctracer to rocprofiler-sdk Apr 13, 2026
Comment thread third_party/proton/csrc/include/Driver/Dispatch.h Outdated
Comment thread third_party/proton/csrc/lib/Profiler/RocprofSDK/RocprofSDKProfiler.cpp Outdated
Comment thread third_party/proton/proton/profile.py Outdated
Comment thread third_party/proton/csrc/lib/Profiler/RocprofSDK/RocprofSDKProfiler.cpp Outdated
Comment thread .github/workflows/integration-tests-amd.yml Outdated
@Jokeren
Copy link
Copy Markdown
Contributor

Jokeren commented May 12, 2026

Hi @ZelboK can you please address the last comment and fix the CI problem? We may want to merge the PR this week

@Jokeren Jokeren merged commit 5073b1f into triton-lang:main May 14, 2026
25 of 30 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants