Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[COMgr] Prepare for COMgr 3.x #3107

Merged
merged 9 commits into from
Jul 22, 2024
Merged

[COMgr] Prepare for COMgr 3.x #3107

merged 9 commits into from
Jul 22, 2024

Conversation

atamazov
Copy link
Contributor

@atamazov atamazov commented Jul 10, 2024

This is intended to resolve #3075. By-products are some small fixes and removal of useless code (e.g HIP builds via COMgr; from now on online HIP builds are supported only via HIPRTC). Details:

  • [comgr] Added support for AMD COMgr 3.0.
  • By-products:
    • [comgr] Removed support for AMD COMgr older than 1.7.
    • [comgr] Removed comgr::BuildHip().
      • Why: The compilation of HIP kernels via COMgr is broken for a long time. Fixing it requires a lot of efforts (because we must guarantee performance of kernels to be on par with kernels built via HIPRTC) but useless because HIPRTC exists on all systems where we have COMgr.
    • [cmake] Do not allow different values of MIOPEN_USE_COMGR and MIOPEN_USE_HIPRTC.
    • [comgr][NFC][quality] Removed support of unused enum members from to_string()
    • [wingorad fury] Fix precompiled binary cache miss for Winograd Fury. Resolves [gfx11][Solvers][Winograd] ConvWinoFuryRxS v2.4 #2778 (comment).
    • [cmake][NFC] Remove leftovers of support of ROCm older than 5.0.

🌀 Performance testing results

Preconditions:

  • Navi21 (gfx1030/36)
  • ROCm 6.0.2 docker miopen:ci_a9eebf, Ubuntu 20.04.5 LTS
  • BUILD_DEV=On, Release build
  • branch amdcomgr-3 2f38063 vs. develop fea5579
  • 93 known FP32 convolution configs
  • 93 known FP16 convolution configs

Tested modes:

  • (0) Immediate mode with default (system) find-db
  • (1) Find Mode (DYNAMIC_HYBRID)
  • (2) Find Mode (NORMAL)
  • (3) Immediate mode with user find-db filled by step (2).

No performance or correctness regressions. Detailed logs & csv files are available upon request.


[Attribution] @junliume @JehandadKhan

@lamb-j lamb-j self-requested a review July 10, 2024 21:22
@lamb-j
Copy link

lamb-j commented Jul 10, 2024

Left one comment, but otherwise Comgr changes LGTM, thanks!

If you're interested, there are some new actions, languages, and data types that have been added that aren't currently included the MIOpen Enums. See here:

https://github.com/ROCm/llvm-project/blob/amd-staging/amd/comgr/docs/ReleaseNotes.md

I assume they could also be added later, as-needed

@lamb-j
Copy link

lamb-j commented Jul 10, 2024

Apparently I can't comment on unchanged lines, but for Line 370 of comgr.cpp:

We may also remove the AMD_COMGR_DATA_KIND_FATBIN language type for 3.0, as we've removed the associated FATBIN action.

It may be convenient to remove it here to avoid a separate patch

@atamazov
Copy link
Contributor Author

@lamb-j Yep, so it seems that you will likely remove AMD_COMGR_DATA_KIND_FATBIN from amd_comgr_data_kind_t. Which means that we'll have a build error and therefore this data kind needs to be removed for COMgr 3.x. Thanks!

@atamazov atamazov marked this pull request as draft July 10, 2024 21:57
@atamazov
Copy link
Contributor Author

@lamb-j Done in a5a7606

@atamazov atamazov marked this pull request as ready for review July 11, 2024 14:55
@atamazov
Copy link
Contributor Author

@junliume @CAHEK7 @lamb-j Ready for CI testing and another review round. Someone please launch the CI job, thanks!

@averinevg
Copy link
Collaborator

CI job launched

@junliume
Copy link
Collaborator

junliume commented Jul 11, 2024

@atamazov can you access the Windows build logs? We need to protect it from breaking too.

C:\home\jenkins\agent\workspace\UIF2_MIOpen_PR-3107\MIOpen\src\comgr.cpp:589:20: error: unused function 'IsWave64Enforced' [-Werror,-Wunused-function]
  | 14824 | 08:40:01 AM | 589 \| static inline bool IsWave64Enforced(const OptionList& opts)
  | 14825 | 08:40:01 AM | \|                    ^~~~~~~~~~~~~~~~

@atamazov
Copy link
Contributor Author

@junliume

can you access the Windows build logs?

Unfortunately, no access

@atamazov
Copy link
Contributor Author

@junliume @apwojcik

C:\home\jenkins\agent\workspace\UIF2_MIOpen_PR-3107\MIOpen\src\comgr.cpp:589:20: error: unused function 'IsWave64Enforced' [-Werror,-Wunused-function]

FYI this may happen only if HIP runtime does not support precompiled headers (PCH), which seems suspicious.

@atamazov
Copy link
Contributor Author

@junliume #3107 (comment) is fixed in 9f73a5b

@shurale-nkn
Copy link
Contributor

@atamazov please merge develop into this branch. Otherwise, it will not pass the CI

@atamazov
Copy link
Contributor Author

@averinevg thanks for e711774!

@junliume junliume merged commit 36ab779 into ROCm:develop Jul 22, 2024
140 checks passed
@lamb-j
Copy link

lamb-j commented Jul 23, 2024

@atamazov @junliume thanks for implementing and merging this!

What's the typical turnaround time for promotion to amd-master for MIOpen?

@atamazov
Copy link
Contributor Author

@lamb-j

What's the typical turnaround time for promotion to amd-master for MIOpen?

It depends on many things. At https://github.com/ROCm/MIOpen/commits/amd-master I see that amd-master has been updated several times in the last month. But on the other hand, before that it remained unchanged for a month or so. I guess that @junliume maintains some mailing lists dedicated to MIOpen promotions, so you can ask him to add your email there.

jangdan added a commit to jangdan/MIOpen that referenced this pull request Aug 1, 2024
commit da4ba7f
Author: Artur Wojcik <[email protected]>
Date:   Tue Jul 30 07:37:57 2024 +0200

    [Windows] Temporary workaround on rocMLIR issue (ROCm#3164)

commit bd993eb
Author: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Date:   Fri Jul 26 23:38:36 2024 -0700

    Bump rocm-docs-core[api_reference] from 1.5.0 to 1.6.1 in /docs/sphinx (ROCm#3160)

    Bumps [rocm-docs-core[api_reference]](https://github.com/ROCm/rocm-docs-core) from 1.5.0 to 1.6.1.
    - [Release notes](https://github.com/ROCm/rocm-docs-core/releases)
    - [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
    - [Commits](ROCm/rocm-docs-core@v1.5.0...v1.6.1)

    ---
    updated-dependencies:
    - dependency-name: rocm-docs-core[api_reference]
      dependency-type: direct:production
      update-type: version-update:semver-minor
    ...

    Signed-off-by: dependabot[bot] <[email protected]>
    Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

commit e158084
Author: BrianHarrisonAMD <[email protected]>
Date:   Wed Jul 24 16:29:22 2024 -0600

    [gtest] Fix log test error (ROCm#3150)

commit 3e983ed
Author: Vasilii Filippov <[email protected]>
Date:   Wed Jul 24 19:31:00 2024 +0200

    Fixed redundant find on failed fallback (ROCm#3131)

    Co-authored-by: Jun Liu <[email protected]>

commit 2fb01fc
Author: who who who <[email protected]>
Date:   Thu Jul 25 01:29:08 2024 +0800

    Fix bnorm args log info (ROCm#3127)

    Co-authored-by: Alex Eremin <[email protected]>

commit 57f4b95
Author: Seungman Han <[email protected]>
Date:   Thu Jul 25 02:21:13 2024 +0900

    Implement getitem backward (ROCm#2883)

commit 2fa300d
Author: Artem Tamazov <[email protected]>
Date:   Wed Jul 24 03:50:55 2024 +0300

    Added logging of paths to HIP, HIPRTC, COMgr, rocBlas, rocMLIR, rocRand, frugally-deep, Eigen3. Added logging of versions of frugally-deep, Eigen3. (ROCm#3110)

commit 61dd569
Author: Vasilii Filippov <[email protected]>
Date:   Wed Jul 24 02:43:14 2024 +0200

    Changed Fusion to have a separate find mode env variable and default to fast (ROCm#3121)

commit ac3d4bd
Author: Evgenii Averin <[email protected]>
Date:   Wed Jul 24 02:22:29 2024 +0200

    [bug][tests] Fix broken tests (ROCm#3123)

    * Fix broken gtests

    * Fix formatting

    * Fix ctests

    ---------

    Co-authored-by: Jun Liu <[email protected]>

commit 56464ff
Author: Vsevolod Golovko <[email protected]>
Date:   Wed Jul 24 02:22:03 2024 +0200

    GTests renaming: filename starting letters A-B (ROCm#3124)

    * Tests with starting letters of the filenames "a"-"b" are renamed

    * "b" test change added

    ---------

    Co-authored-by: Jun Liu <[email protected]>

commit 9685f42
Author: Vasilii Filippov <[email protected]>
Date:   Wed Jul 24 02:21:29 2024 +0200

    Added missing weight sorting as WTI calculation in fusion (ROCm#3105)

commit 97ec508
Author: BrianHarrisonAMD <[email protected]>
Date:   Mon Jul 22 00:55:18 2024 -0600

    Fix GCC8 linking issues with stdc++fs (ROCm#3126)

commit 3991184
Author: Artem Tamazov <[email protected]>
Date:   Mon Jul 22 08:40:53 2024 +0300

    [debugging] Make naive conv solvers obey MIOPEN_DEBUG_HIP_KERNELS (ROCm#3111)

commit 36ab779
Author: Artem Tamazov <[email protected]>
Date:   Mon Jul 22 08:32:38 2024 +0300

    [COMgr] Prepare for COMgr 3.x (ROCm#3107)

    * amdcomgr-3(01) [cmake] Do not allow different values of MIOPEN_USE_COMGR and MIOPEN_USE_HIPRTC. Remove leftover of support of ROCm older than 5.0.

    * amdcomgr-3(02) [importance_normal] Fix precompiled binary cache miss for Winograd Fury. Resolves ROCm#2778 (comment)

    * amdcomgr-3(03) [winograd fury] Add comment.

    * amdcomgr-3(04) [comgr] Removed comgr::BuildHip()

    * amdcomgr-3(05) [comgr] Added support for AMD COMgr 3.0. Removed support for AMD COMgr older than 1.7.

    * amdcomgr-3(06) [comgr] Removed support of unused enum members from to_string().

    * amdcomgr-3(08) [comgr] Fix tidy error

    ---------

    Co-authored-by: Evgenii Averin <[email protected]>

commit f972ecc
Author: BrianHarrisonAMD <[email protected]>
Date:   Fri Jul 19 11:56:47 2024 -0600

    Update GraphAPI Tensor to inherit from TensorDescriptor (ROCm#3119)

commit 9311d8d
Author: Jun Liu <[email protected]>
Date:   Fri Jul 19 10:55:33 2024 -0700

    Revert "[Staging] Update CK commit hash in requirements.txt (ROCm#3122)" (ROCm#3125)

    This reverts commit 8449363.

commit 8449363
Author: Jun Liu <[email protected]>
Date:   Wed Jul 17 17:12:46 2024 -0700

    [Staging] Update CK commit hash in requirements.txt (ROCm#3122)

commit 845c1f4
Author: Chris Erb <[email protected]>
Date:   Wed Jul 17 17:32:59 2024 -0500

    [gfx12] add support of gfx12 platforms (ROCm#3109)

commit aa175c7
Author: BrianHarrisonAMD <[email protected]>
Date:   Tue Jul 16 23:21:06 2024 -0600

    Add tests for ConvBiasResAddActivation forward using Graph API (ROCm#3112)

    * Add tests for ConvBiasResAddActivation forward using Graph API

    * Apply formatting

commit 88e2f5e
Author: BrianHarrisonAMD <[email protected]>
Date:   Tue Jul 16 17:38:36 2024 -0600

    [WA] Remove exception checking tests for unsupported hardware (ROCm#3117)

    * Remove exception checking tests for unsupported hardware

    * Add workaround define and re-add tests

commit 9e8a10c
Author: mentat <[email protected]>
Date:   Fri Jul 12 12:57:23 2024 -0500

    [ROCm 6.2] add change log (ROCm#3113)

commit bd0054b
Author: Artem Tamazov <[email protected]>
Date:   Fri Jul 12 20:44:59 2024 +0300

    Find: Optimize benchmarking for long execution times. (ROCm#3103)

commit b2d0707
Author: Vasilii Filippov <[email protected]>
Date:   Fri Jul 12 19:43:56 2024 +0200

    [Fixes CI] Fixed tidy hang (ROCm#3114)

    * Fixed tidy hang by disabling a check on bugged versions

    * Added a comment on why this is necessary

commit 5019eb6
Author: Evgenii Averin <[email protected]>
Date:   Wed Jul 10 23:07:29 2024 +0200

    [NFC] Move convolution solvers to solver/conv directory (part 2) (ROCm#3042)

commit 75195fa
Author: Vasilii Filippov <[email protected]>
Date:   Wed Jul 10 00:48:40 2024 +0200

    Support for invoker caching and find modes to fusion find (ROCm#3095)

    * implemented support for invoker caching and find modes to fusion find

    * Fixed some errors

commit 4a7ba2f
Author: Chris Erb <[email protected]>
Date:   Tue Jul 9 17:43:45 2024 -0500

    Unblock CI by regenerating gfx908.kdb (ROCm#3092)

commit 1141e42
Author: BrianHarrisonAMD <[email protected]>
Date:   Tue Jul 9 12:02:36 2024 -0600

    Enable hipBLASLt backend for GEMM, and make it the default option for RNN's under specific conditions (ROCm#3030)

commit 80a28a2
Author: Jun Liu <[email protected]>
Date:   Tue Jul 9 11:02:07 2024 -0700

    Update CK commit hash in requirements.txt (ROCm#3087)

    * Update CK commit hash in requirements.txt

    * update CK commit to include fix

    * use CK version with cmake fixes

    * update CK commit hash to include the fix

    ---------

    Co-authored-by: illsilin <[email protected]>

commit fea5579
Author: Artem Tamazov <[email protected]>
Date:   Mon Jul 8 20:57:51 2024 +0300

    Fix ROCm#3076 - issues related to in_offset overflow in Col2Im (2D and 3D) and Im2Col (2D only) (ROCm#3099)

    * fix-overflow-col2im2d(01) Switch to unsigned math in 2D and 3D col2im kernels.

    * fix-overflow-col2im2d(02) Add "dry-run" mode to Col2ImGPU() and use it.

    * fix-overflow-col2im2d(03) Revert "fix-overflow-col2im2d(02) Add "dry-run" mode to Col2ImGPU() and use it."

    This reverts commit c8f9d40.

    * fix-overflow-col2im2d(04) Add 64-bit index mode

    * fix-overflow-col2im2d(06) Tidy fix

    * fix-overflow-col2im2d(07) Minimal changes to avoid trimming of im_offset in 2D Im2Col

commit e112f18
Author: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Date:   Mon Jul 8 11:49:05 2024 -0600

    Bump certifi from 2024.2.2 to 2024.7.4 in /docs/sphinx (ROCm#3102)

    Bumps [certifi](https://github.com/certifi/python-certifi) from 2024.2.2 to 2024.7.4.
    - [Commits](certifi/python-certifi@2024.02.02...2024.07.04)

    ---
    updated-dependencies:
    - dependency-name: certifi
      dependency-type: indirect
    ...

    Signed-off-by: dependabot[bot] <[email protected]>
    Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

commit ac167b9
Author: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Date:   Thu Jul 4 15:35:05 2024 -0600

    Bump rocm-docs-core[api_reference] from 1.4.1 to 1.5.0 in /docs/sphinx (ROCm#3101)

    Bumps [rocm-docs-core[api_reference]](https://github.com/ROCm/rocm-docs-core) from 1.4.1 to 1.5.0.
    - [Release notes](https://github.com/ROCm/rocm-docs-core/releases)
    - [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
    - [Commits](ROCm/rocm-docs-core@v1.4.1...v1.5.0)

    ---
    updated-dependencies:
    - dependency-name: rocm-docs-core[api_reference]
      dependency-type: direct:production
      update-type: version-update:semver-minor
    ...

    Signed-off-by: dependabot[bot] <[email protected]>
    Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

commit da88aa2
Author: Sergei Grigoriev <[email protected]>
Date:   Wed Jul 3 16:09:53 2024 +0200

    [MHA] Implement MIOPEN_BACKEND_OPERATION_RESHAPE_DESCRIPTOR (ROCm#3082)

    * Implement MIOPEN_BACKEND_OPERATION_RESHAPE_DESCRIPTOR

    * Introduce an enum to indicate transpose op

    * Fix possible throws in the test

commit 0fcbe71
Author: sgundabo <[email protected]>
Date:   Wed Jul 3 09:08:10 2024 -0500

    [Windows] Debug vector add Windows CI failure. (ROCm#3090)

    * fixed missing <CR> in the kernel

    * replaced uint with size_t to fix Windows CI

    * clang format fix

    * Fixed duplicating info, narrowing conversion.

commit 31d9b87
Author: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Date:   Sun Jun 30 20:42:25 2024 -0700

    Bump rocm-docs-core[api_reference] from 1.4.0 to 1.4.1 in /docs/sphinx (ROCm#3089)

    Bumps [rocm-docs-core[api_reference]](https://github.com/ROCm/rocm-docs-core) from 1.4.0 to 1.4.1.
    - [Release notes](https://github.com/ROCm/rocm-docs-core/releases)
    - [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
    - [Commits](ROCm/rocm-docs-core@v1.4.0...v1.4.1)

    ---
    updated-dependencies:
    - dependency-name: rocm-docs-core[api_reference]
      dependency-type: direct:production
      update-type: version-update:semver-patch
    ...

    Signed-off-by: dependabot[bot] <[email protected]>
    Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

commit 072b492
Author: Daming Feng <[email protected]>
Date:   Sun Jun 30 11:05:25 2024 -0500

    Merge Fwd Wrw KTN with refactored Adding split_k Wrw for OOTB CVT Performance Improvement (ROCm#3091)

commit 419848f
Author: sgundabo <[email protected]>
Date:   Fri Jun 28 00:57:05 2024 -0500

    standalone kernels test + gtest (ROCm#3039)

commit b97341d
Author: BrianHarrisonAMD <[email protected]>
Date:   Thu Jun 27 18:33:55 2024 -0600

    Fix MIOpenDriver gemm with column ordering (ROCm#3068)

commit 3552626
Author: xinlipn <[email protected]>
Date:   Thu Jun 27 17:32:57 2024 -0700

    [HotFix] Additional fix for build errors on SLES due to incomplete C++17 support (ROCm#3085)

commit 7a5458e
Author: Vsevolod Golovko <[email protected]>
Date:   Fri Jun 28 02:30:02 2024 +0200

    C API Mha Backward Test (raw code) (ROCm#3073)

commit 96e971f
Author: Jungkeun Kim <[email protected]>
Date:   Fri Jun 28 03:38:46 2024 +0900

    Impl transformers adam w (ROCm#2956)

commit 911e671
Author: Kamil Nasyrov <[email protected]>
Date:   Thu Jun 27 06:44:00 2024 +0200

    [RNN] MS algorithm update  (ROCm#3080)

commit 19988a7
Author: Jungkeun Kim <[email protected]>
Date:   Wed Jun 26 13:55:33 2024 +0900

    Impl adam_w: Add AdamW to Adam optimizer (ROCm#2957)

commit 7cf8180
Author: Evgenii Averin <[email protected]>
Date:   Tue Jun 25 06:08:45 2024 +0200

    Fix: Crunching the latest data, just for you. Hang tight… (ROCm#3069)

commit edc8e85
Author: mentat <[email protected]>
Date:   Mon Jun 24 23:08:04 2024 -0500

    fix kdb issue caused by naive alpha and beta  support (ROCm#3061)

    * fix kdb issue caused by naive alpha and beta  support

    * rename naive kernel for both packed and unpacked

commit 3eed111
Author: Evgenii Averin <[email protected]>
Date:   Tue Jun 25 05:21:32 2024 +0200

    [HotFix] Fix compilation on Windows after ROCm#3060 (ROCm#3070)

    * Fix windows build

    * Attempt moreh-dev#2

    * Uncomment explicit instantiation

    * Add include guard for explicit template instantiation declaration

    * Remove template exporting
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Remove deprecated Comgr actions
6 participants