Skip to content

Jzhou/3dconv#3919

Closed
jfactory07 wants to merge 11 commits into
developfrom
jzhou/3dconv
Closed

Jzhou/3dconv#3919
jfactory07 wants to merge 11 commits into
developfrom
jzhou/3dconv

Conversation

@jfactory07
Copy link
Copy Markdown

@jfactory07 jfactory07 commented Jul 28, 2025

Proposed changes

Enhance 3D convolution performance in immediate mode (required for dynamic shapes to avoid kernel search overhead).

  1. Smarter kernel selection: Optimizes fp16/bf16 performance by choosing better CK kernels contextually, rather than using the first kernel by default.

  2. Runtime control: Adds the CK_CONV3D_IDX env variable for explicit kernel selection.

Checklist

Please put an x into the boxes that apply. You can also fill these out after creating the PR. If you're not sure, please don't hesitate to ask.

  • I have added automated tests relevant to the introduced functionality
  • I have sufficient test coverage for the changes, and code coverage hasn't decreased as a result of my PR
  • I have ran the tests, and they are all passing locally
  • I have added relevant documentation for the changes
  • I have removed the stale documentation which is no longer relevant after this pull request
  • I have ran make format & make check_format to ensure the changes have been formatted

@jfactory07 jfactory07 marked this pull request as ready for review July 28, 2025 10:02
@ammallya ammallya reopened this Jul 28, 2025
@amd-hsivasun
Copy link
Copy Markdown

Imported to ROCm/rocm-libraries

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants