Skip to content

[W/A] disable group conv in asm igemm solver#1978

Merged
junliume merged 4 commits into
developfrom
disable_group_conv_for_asm_igemm
Feb 13, 2023
Merged

[W/A] disable group conv in asm igemm solver#1978
junliume merged 4 commits into
developfrom
disable_group_conv_for_asm_igemm

Conversation

@junliume
Copy link
Copy Markdown
Contributor

@junliume junliume commented Feb 11, 2023

disable group conv in asm igemm solver.

Workaround for Issue #1979

P.S. removal of WORKAROUND_ISSUE_1053 as a by-product (6 months after WA is removed, eliminate related code)

carlushuang
carlushuang previously approved these changes Feb 11, 2023
Copy link
Copy Markdown
Contributor

@carlushuang carlushuang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

carlushuang
carlushuang previously approved these changes Feb 11, 2023
Copy link
Copy Markdown
Contributor

@carlushuang carlushuang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Copy Markdown
Contributor

@atamazov atamazov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • PR description is missing the reasoning
  • W/A does not comply #1685

Please do not hesitate to ask for assistance, if you need it (or if do not have time).

@junliume
Copy link
Copy Markdown
Contributor Author

@carlushuang @atamazov : updated this PR to comply with MIOpen lifecycle of workarounds

@junliume junliume requested review from atamazov and carlushuang and removed request for atamazov February 12, 2023 05:57
Copy link
Copy Markdown
Contributor

@atamazov atamazov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Comment on lines +37 to +38
/// W/A for issue 1979: igemm solver does not support group conv. See:
/// https://github.com/ROCmSoftwarePlatform/MIOpen/issues/1979
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Notice] Basically, WORKAROUND_ISSUE_1979 says everything so the comment is not required.

Comment thread test/CMakeLists.txt
option( MIOPEN_TEST_WITH_MIOPENDRIVER "Use MIOpenDriver in tests" ${MIOPEN_TEST_WITH_MIOPENDRIVER_DEFAULT})

option( WORKAROUND_ISSUE_936 "" ON)
option( WORKAROUND_ISSUE_1053 "" OFF) # TODO: Remove this W/A after ~6 months (in January 2023)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Notice] The removal of WORKAROUND_ISSUE_1053 is a by-product which is not mentioned in the description.

Copy link
Copy Markdown
Contributor

@carlushuang carlushuang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@junliume junliume merged commit 591ab79 into develop Feb 13, 2023
junliume added a commit that referenced this pull request Feb 13, 2023
* [W/A] disable group conv in asm igemm solver

* fix for resolver refactor changes

* remove test related to the disabled solver in previous commit

* Comply with MIOpen lifecycle of workarounds

---------

Co-authored-by: carlushuang <carlus.huang@amd.com>
@junliume junliume deleted the disable_group_conv_for_asm_igemm branch March 23, 2023 20:39
assistant-librarian Bot pushed a commit that referenced this pull request Oct 8, 2025
[MIOpen] bugfix: GenericSearch warm-up bias: apply warm-up to
 all kernel configurations (#1978)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

# Fix GenericSearch warm-up bias: apply warm-up to all configurations

>** 📝 Note**: This PR has follow up PR #1993
## Motivation

MIOpen's generic search algorithm suffers from a **race condition** that
causes optimal kernels to be randomly rejected, leading to 3-4x
performance degradation in some cases.

### Problem Description

When running the same convolution workload multiple times as sample
below:
```bash
MIOpenDriver convbfp16 -n 8 -c 5 -H 225 -W 225 -k 64 -y 3 -x 3 -p 1 -q 1 \
  -u 1 -v 1 -l 1 -j 1 --in_layout NHWC --fil_layout NHWC --out_layout NHWC \
  -m conv -g 1 -t 1 -F 2
```

**Observed behavior:**
- **Lucky case**: Selected optimal kernel → **0.099 ms** per operation
- **Unlucky case**: Selected suboptimal kernel → **0.332 ms** per
operation (**3.35x slower**)

[Lucky_Joe_20250930.log](https://github.com/user-attachments/files/22714922/Lucky_Joe_20250930.log)

[Normal_Joe_20250930.log](https://github.com/user-attachments/files/22714923/Normal_Joe_20250930.log)

### Root Cause

**Cold-start bias in warm-up logic** (`generic_search.hpp`, lines
559-564):

```cpp
// Original buggy code
if(n_current == 0)  // ❌ Only first config gets warm-up
{
    invoker(profile_h, invoke_ctx);
    profile_h.ResetKernelTime();
}
```

This condition creates an **unfair advantage** for the first
configuration tested:
- **First kernel** (n_current == 0): Gets warm-up → Fair performance
measurement
- **Subsequent kernels** (n_current > 0): No warm-up → Cold-start
penalty (up to **100x slower** in extreme cases)

### Impact

- **High false negative rate**: Up to 40% chance of rejecting the
optimal kernel
- **Performance degradation**: 4x slower execution when suboptimal
kernel is selected
- **Non-deterministic behavior**: Kernel selection depends on which
configuration is tested first

### Example from Production Logs

**Environment**: MI355X (gfx950), ROCm 7.0.2

```
AI generated 4 kernel configurations for testing:

Kernel #0 (128,128,32,32,8,8...): 10 samples → avg 0.343166 ms → selected as "best"
Kernel #1 (64,64,64,32,8,8...):   1 sample  → 1.219 ms         → rejected (cold-start!)
Kernel #2 (64,64,16,32,8,8...):   1 sample  → 3.0267 ms        → rejected (cold-start!)
Kernel #3 (64,16,64,32,8,8...):   1 sample  → 0.482 ms         → rejected (cold-start!)

Final execution: 0.332 ms (using Kernel #0)

Issue: Kernel #2 suffered from cold-start bias (3.0267 ms first sample)
       With proper warm-up, its true performance is ~0.099 ms (3.4x faster than selected kernel)
```

**Detailed timing from Normal_Joe_20250930.log:**

Optimal kernel (incorrectly rejected due to cold-start):
-
`DeviceGroupedConvBwdData_Xdl_CShuffle_v1<64,64,16,32,8,8,Default,16,16,4,1,8,1,1,1>+1`
- Sample 1: **3.027 ms** ← Cold start! (30x slower than true
performance)
- Samples 2-11: 0.369, 0.349, 0.366, 0.352, 0.353, 0.365, 0.352, 0.359,
0.347, 0.352 ms
- **True mean**: 0.354 ms (excluding cold-start outlier)
- **Decision**: Rejected by early-stop (3.027 > 0.377 × 1.1)
- **Wrong outcome**: Best kernel discarded due to unfair cold-start
penalty
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants