fmha v3 API only generate for supported targeting GPU arch#1134
Closed
HollowMan6 wants to merge 1 commit intoROCm:mainfrom
HollowMan6:fmha_v3
Closed
fmha v3 API only generate for supported targeting GPU arch#1134HollowMan6 wants to merge 1 commit intoROCm:mainfrom HollowMan6:fmha_v3
HollowMan6 wants to merge 1 commit intoROCm:mainfrom
HollowMan6:fmha_v3
Conversation
Contributor
There was a problem hiding this comment.
Pull Request Overview
Restrict fmha v3 code generation to known-supported GPU architectures to avoid compile errors on unsupported arch targets (e.g., gfx90a/gfx940).
- Introduce a supported-arch allowlist for v3 API generation.
- Gate single-target v3 generation on the allowlist; otherwise fall back to multi-target path.
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| csrc/cpp_itfs/mha_fwd_generate.py | Adds V3_SUPPORTED_ARCH and updates get_v3_api to only emit single-target v3 calls for supported arches. |
| csrc/cpp_itfs/mha_bwd_generate.py | Mirrors the fwd changes for the backward path. |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
Contributor
There was a problem hiding this comment.
Pull Request Overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
Refactor to resolve the building issue for gfx940, and also fix symbol not found if "gfx942" or "gfx950" is not the targeting compiling GPU arch:
```log
In file included from TransformerEngine/build/cmake/ck_fused_attn/gen_src/mha_bwd.cpp:1:
In file included from TransformerEngine/transformer_engine/common/ck_fused_attn/../../../3rdparty/aiter/csrc/include/mha_bwd.h:7:
TransformerEngine/transformer_engine/common/ck_fused_attn/../../../3rdparty/aiter/csrc/include/aiter_hip_common.h:147:5: warning: ignoring return value of function declared with 'nodiscard' attribute [-Wunused-result]
147 | hipGetDeviceProperties(&prop, 0);
| ^~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~
/appl/lumi/SW/CrayEnv/EB/rocm/6.2.2/include/hip/hip_runtime_api.h:91:32: note: expanded from macro 'hipGetDeviceProperties'
91 | #define hipGetDeviceProperties hipGetDevicePropertiesR0600
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~
TransformerEngine/build/cmake/ck_fused_attn/gen_src/mha_bwd.cpp:69:9: error: use of undeclared identifier 'gfx90a'
69 | t = gfx90a::fmha_bwd_v3(traits, args, stream_config, seqlen_q_padded, seqlen_k_padded, is_v3_api_check);
| ^
1 warning and 1 error generated when compiling for gfx90a.
```
Signed-off-by: Hollow Man <hollowman@opensuse.org>
Contributor
Author
|
Looks like this has already been solved by #1318, so I'm closing this |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
Refactor to resolve the building issue for gfx940, and also fix symbol not found if "gfx942" or "gfx950" is not the targeting compiling GPU arch:
Technical Details
Use a list of supported GPU arch to determine before generation.
Test Plan
Test Result
Submission Checklist