Skip to content

fmha v3 API only generate for supported targeting GPU arch#1134

Closed
HollowMan6 wants to merge 1 commit intoROCm:mainfrom
HollowMan6:fmha_v3
Closed

fmha v3 API only generate for supported targeting GPU arch#1134
HollowMan6 wants to merge 1 commit intoROCm:mainfrom
HollowMan6:fmha_v3

Conversation

@HollowMan6
Copy link
Contributor

@HollowMan6 HollowMan6 commented Oct 7, 2025

Motivation

Refactor to resolve the building issue for gfx940, and also fix symbol not found if "gfx942" or "gfx950" is not the targeting compiling GPU arch:

  In file included from TransformerEngine/build/cmake/ck_fused_attn/gen_src/mha_bwd.cpp:1:
  In file included from TransformerEngine/transformer_engine/common/ck_fused_attn/../../../3rdparty/aiter/csrc/include/mha_bwd.h:7:
  TransformerEngine/transformer_engine/common/ck_fused_attn/../../../3rdparty/aiter/csrc/include/aiter_hip_common.h:147:5: warning: ignoring return value of function declared with 'nodiscard' attribute [-Wunused-result]
    147 |     hipGetDeviceProperties(&prop, 0);
        |     ^~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~
  /appl/lumi/SW/CrayEnv/EB/rocm/6.2.2/include/hip/hip_runtime_api.h:91:32: note: expanded from macro 'hipGetDeviceProperties'
     91 | #define hipGetDeviceProperties hipGetDevicePropertiesR0600
        |                                ^~~~~~~~~~~~~~~~~~~~~~~~~~~
  TransformerEngine/build/cmake/ck_fused_attn/gen_src/mha_bwd.cpp:69:9: error: use of undeclared identifier 'gfx90a'
     69 |     t = gfx90a::fmha_bwd_v3(traits, args, stream_config, seqlen_q_padded, seqlen_k_padded, is_v3_api_check);
        |         ^
  1 warning and 1 error generated when compiling for gfx90a.

Technical Details

Use a list of supported GPU arch to determine before generation.

Test Plan

Test Result

Submission Checklist

Copilot AI review requested due to automatic review settings October 7, 2025 08:29
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Restrict fmha v3 code generation to known-supported GPU architectures to avoid compile errors on unsupported arch targets (e.g., gfx90a/gfx940).

  • Introduce a supported-arch allowlist for v3 API generation.
  • Gate single-target v3 generation on the allowlist; otherwise fall back to multi-target path.

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
csrc/cpp_itfs/mha_fwd_generate.py Adds V3_SUPPORTED_ARCH and updates get_v3_api to only emit single-target v3 calls for supported arches.
csrc/cpp_itfs/mha_bwd_generate.py Mirrors the fwd changes for the backward path.

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@valarLip valarLip requested a review from slippedJim October 7, 2025 08:52
@HollowMan6 HollowMan6 changed the title fmha v3 API only generate for supported GPU arch fmha v3 API only generate for supported targeting GPU arch Oct 7, 2025
@HollowMan6 HollowMan6 requested a review from Copilot October 7, 2025 13:58
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.


Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Refactor to resolve the building issue for gfx940, and also fix symbol not found if "gfx942" or "gfx950" is not the targeting compiling GPU arch:

```log
  In file included from TransformerEngine/build/cmake/ck_fused_attn/gen_src/mha_bwd.cpp:1:
  In file included from TransformerEngine/transformer_engine/common/ck_fused_attn/../../../3rdparty/aiter/csrc/include/mha_bwd.h:7:
  TransformerEngine/transformer_engine/common/ck_fused_attn/../../../3rdparty/aiter/csrc/include/aiter_hip_common.h:147:5: warning: ignoring return value of function declared with 'nodiscard' attribute [-Wunused-result]
    147 |     hipGetDeviceProperties(&prop, 0);
        |     ^~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~
  /appl/lumi/SW/CrayEnv/EB/rocm/6.2.2/include/hip/hip_runtime_api.h:91:32: note: expanded from macro 'hipGetDeviceProperties'
     91 | #define hipGetDeviceProperties hipGetDevicePropertiesR0600
        |                                ^~~~~~~~~~~~~~~~~~~~~~~~~~~
  TransformerEngine/build/cmake/ck_fused_attn/gen_src/mha_bwd.cpp:69:9: error: use of undeclared identifier 'gfx90a'
     69 |     t = gfx90a::fmha_bwd_v3(traits, args, stream_config, seqlen_q_padded, seqlen_k_padded, is_v3_api_check);
        |         ^
  1 warning and 1 error generated when compiling for gfx90a.
```

Signed-off-by: Hollow Man <hollowman@opensuse.org>
@HollowMan6
Copy link
Contributor Author

Looks like this has already been solved by #1318, so I'm closing this

@HollowMan6 HollowMan6 closed this Nov 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants