refactor(triton): reorganize conv modules and unify gated FP8 quant path by hellozhuo-amd · Pull Request #3048 · ROCm/aiter

hellozhuo-amd · 2026-05-06T08:51:02Z

Summary

This PR reorganizes Triton conv1d code into dedicated conv modules and folds gated RMS+FP8 quantization into the existing fused FP8 kernel path.
Related comments: #3005 (comment)

What changed

Conv module reorganization
- Moved conv1d Triton kernels under:
  - aiter/ops/triton/_triton_kernels/conv/
- Moved conv1d Python wrappers under:
  - aiter/ops/triton/conv/
- Added __init__.py exports for the new conv packages.
FP8 quant integration
- Unified gated RMS+FP8 flow as a feature path in the existing fused FP8 implementation instead of maintaining a separate standalone wrapper path.
- Kept both:
  - classic path (GATED_RMS_FP8=False), and
  - gated path (GATED_RMS_FP8=True),
    in the same fused kernel flow.
- Preserved gated launch behavior with dynamic ROWS_PER_BLOCK (calc_rows_per_block) for gated execution.
Test layout + script updates
- Moved conv tests to:
  - op_tests/triton_tests/conv/test_causal_conv1d.py
  - op_tests/triton_tests/conv/test_causal_conv1d_update_single_token.py
- Updated split test mapping:
  - .github/scripts/split_tests.sh

Motivation

Align conv1d structure with existing Triton package organization patterns (attention, quant, etc.) for easier discovery and maintenance.
Reduce duplicate code paths in FP8 quantization by treating gating as an additional feature on top of the existing fused kernel path.
Keep test coverage explicit for both gated and classic quant paths after refactor.

Technical Details

Introduced new conv package roots in both kernel and wrapper layers.
Updated conv test paths accordingly.
Consolidated gated/classic quant control flow under the fused FP8 kernel path with feature flags.
Retained behavior-sensitive launch configuration for gated quant via dynamic rows-per-block heuristic.

Test Plan

Ran:

python3 -m pytest \
  op_tests/triton_tests/test_fused_rearrange_sigmoid_gdr.py \
  op_tests/triton_tests/conv/test_causal_conv1d_update_single_token.py \
  op_tests/triton_tests/quant/test_fused_rms_gated_fp8_group_quant.py \
  op_tests/triton_tests/quant/test_fused_fp8_quant.py::test_fused_rms_fp8_group_quant \
  -v

Additional validation:

black --check and ruff check on touched Triton conv/quant files.

Test Result

Targeted test bundle passed.
Gated quant sweep + classic fused FP8 quant tests passed.
Formatting/lint checks on touched files passed.

From the vllm side, the speed of the gated fp8 kernel was maintained.

Notes

No functional regressions observed in covered paths.
Includes path/layout refactor, so downstream imports should use the new conv module paths.

Submission Checklist

Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

Move causal conv1d Triton code into dedicated conv subpackages by relocating kernels and wrappers under aiter.ops.triton._triton_kernels.conv and aiter.ops.triton.conv, and move related Triton tests into op_tests/triton_tests/conv for consistent structure. Integrate gated RMS+FP8 quantization as an additional feature path in the existing fused FP8 kernel flow instead of maintaining a separate standalone gated kernel wrapper path. Keep both gated and classic test coverage in the test commands and update split-tests mapping to the new conv test path.

github-actions · 2026-05-06T08:51:51Z

🏷️ CI Guide

Runs automatically on every PR:

✅ Pre-checks (submodule verification, code formatting)
✅ Aiter op tests (gfx942 + gfx950)
✅ Triton tests on MI35X (only when aiter/ops/triton/** or related paths are changed)

Extended tests (opt-in via labels):

Label	Tests
`ci:triton-300x`	Run an additional Triton test job on MI300X in PRs; main branch always runs both MI35X and MI300X
`ci:sglang`	SGLang integration tests
`ci:atom`	ATOM benchmark (DeepSeek-R1 + GPT-OSS)
`ci:vllm`	vLLM benchmark
`ci:all`	All of the above

Add labels via the sidebar or gh pr edit 3048 --add-label <label>

Copilot

Pull request overview

This PR reorganizes the Triton causal conv1d implementation into dedicated conv/ subpackages and folds the previously separate gated RMSNorm+FP8 quantization path into the existing fused FP8 kernel flow.

Changes:

Relocate causal conv1d wrappers/kernels under aiter.ops.triton.conv and aiter.ops.triton._triton_kernels.conv, updating tests and split-test mapping accordingly.
Unify gated RMSNorm+FP8 group quantization by adding a gated mode to _fused_rms_fp8_group_quant_kernel and routing the gated wrapper through it.
Minor repo hygiene updates (ignore vim swap files; normalize .ipynb_checkpoints ignore pattern).

Reviewed changes

Copilot reviewed 9 out of 12 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
op_tests/triton_tests/conv/test_causal_conv1d.py	Update imports to new `aiter.ops.triton.conv` / kernel conv subpackage paths.
op_tests/triton_tests/conv/test_causal_conv1d_update_single_token.py	Update imports to new conv subpackage paths.
aiter/ops/triton/quant/fused_fp8_quant.py	Route gated RMS+FP8 quant through unified fused kernel path; minor compatibility tweaks for fp8 dtype bounds + heuristics.
aiter/ops/triton/conv/causal_conv1d.py	Update kernel import path to `_triton_kernels.conv`.
aiter/ops/triton/conv/causal_conv1d_update_single_token.py	Update kernel import paths to `_triton_kernels.conv`.
aiter/ops/triton/conv/init.py	New conv package exports for causal conv1d APIs.
aiter/ops/triton/_triton_kernels/quant/fused_fp8_quant.py	Extend fused RMS+FP8 group quant kernel to support gated mode; remove standalone gated kernel.
aiter/ops/triton/_triton_kernels/conv/causal_conv1d.py	New conv kernel module location for causal conv1d fwd/update kernels.
aiter/ops/triton/_triton_kernels/conv/causal_conv1d_update_single_token.py	New conv kernel module location for single-token update kernels.
aiter/ops/triton/_triton_kernels/conv/init.py	New conv kernel subpackage init/export.
.gitignore	Add vim swap ignores; fix `.ipynb_checkpoints` glob indentation.
.github/scripts/split_tests.sh	Update split-test mapping to new conv test path.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…ant.py

Agent-Logs-Url: https://github.com/ROCm/aiter/sessions/68e13a64-8717-4b09-b862-c8bb8f4eb642 Co-authored-by: hellozhuo-amd <225919697+hellozhuo-amd@users.noreply.github.com>

…d FP8 launch Agent-Logs-Url: https://github.com/ROCm/aiter/sessions/85686171-3024-472f-818f-6ed9d52ee761 Co-authored-by: hellozhuo-amd <225919697+hellozhuo-amd@users.noreply.github.com>

- Map causal_conv1d and causal_conv1d_update_single_token in _BACKWARD_COMPAT_MAP for legacy flat imports (e.g. vLLM). - Import conv APIs in op_tests via flat aiter.ops.triton.* paths. - Silence ruff F401 on intentional comms re-exports in triton __init__. Co-authored-by: Cursor <cursoragent@cursor.com>

Boss2002n · 2026-05-08T02:20:08Z

can we keep this file empty

Boss2002n · 2026-05-08T02:20:16Z

we can empty this file too, unless theres a reason as to why we need this.
It will be consistent with the rest of the repo if we empty it :)

Boss2002n · 2026-05-08T02:20:26Z

        x,
        weight,
-        bias,
-        z,
+        dummy,
+        dummy,
+        dummy,
        x_quant,
        scales,
+        dummy,
+        dummy,
+        dummy,
+        eps,
+        0.0,
+        M,
+        N,
+        0,
        x.stride(0),
-        z.stride(0),
+        1,
+        x.stride(1),
+        1,
+        1,
+        1,
        x_quant.stride(0),
+        x_quant.stride(1),
        stride_s_row,
        stride_s_g,
-        M,
-        N,
-        eps,
+        1,
+        1,
+        1,
+        1,
+        1,
+        1,
+        z,
+        bias_ptr,
+        z.stride(0),


can we not do positional args for this, keyword args make it easy to read, it is very hard to follow given we have so many args being passed

Boss2002n · 2026-05-08T02:22:00Z

I am wondering if we can split these into 2 different kernels instead of having an if-else.
would help making it easier to read

hellozhuo-amd requested review from a team and Copilot May 6, 2026 08:51

Copilot started reviewing on behalf of hellozhuo-amd May 6, 2026 08:53 View session

Copilot AI reviewed May 6, 2026

View reviewed changes

Comment thread aiter/ops/triton/_triton_kernels/quant/fused_fp8_quant.py

Comment thread aiter/ops/triton/quant/fused_fp8_quant.py Outdated

Updated op_tests/triton_tests/quant/test_fused_rms_gated_fp8_group_qu…

f7616c4

…ant.py

hellozhuo-amd self-assigned this May 6, 2026

hellozhuo-amd requested review from azaidy and juuso-oskari May 6, 2026 09:29

Copilot started work on behalf of hellozhuo-amd May 6, 2026 09:43 View session

fix: add group boundary masking in gated FP8 quant kernel loop

658a217

Agent-Logs-Url: https://github.com/ROCm/aiter/sessions/68e13a64-8717-4b09-b862-c8bb8f4eb642 Co-authored-by: hellozhuo-amd <225919697+hellozhuo-amd@users.noreply.github.com>

Copilot finished work on behalf of hellozhuo-amd May 6, 2026 09:47

Copilot started work on behalf of hellozhuo-amd May 6, 2026 09:49 View session

Copilot finished work on behalf of hellozhuo-amd May 6, 2026 09:52

hellozhuo-amd and others added 3 commits May 6, 2026 12:57

Merge branch 'main' into zhuo/triton-pr2423-reorg

076ab91

Merge branch 'main' into zhuo/triton-pr2423-reorg

e1687f3

azaidy requested review from Boss2002n and k50112113 May 7, 2026 14:39

Merge branch 'main' into zhuo/triton-pr2423-reorg

2e74ecd

Boss2002n requested changes May 8, 2026

View reviewed changes

brunomazzottiamd mentioned this pull request May 8, 2026

[TRITON] Conv Kernels First Commit to AITER #2886

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(triton): reorganize conv modules and unify gated FP8 quant path#3048

refactor(triton): reorganize conv modules and unify gated FP8 quant path#3048
hellozhuo-amd wants to merge 8 commits intomainfrom
zhuo/triton-pr2423-reorg

hellozhuo-amd commented May 6, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 6, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Boss2002n May 8, 2026

Uh oh!

Boss2002n May 8, 2026

Uh oh!

Boss2002n May 8, 2026

Uh oh!

Boss2002n May 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

hellozhuo-amd commented May 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What changed

Motivation

Technical Details

Test Plan

Test Result

Notes

Submission Checklist

Uh oh!

github-actions Bot commented May 6, 2026

🏷️ CI Guide

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Boss2002n May 8, 2026

Choose a reason for hiding this comment

Uh oh!

Boss2002n May 8, 2026

Choose a reason for hiding this comment

Uh oh!

Boss2002n May 8, 2026

Choose a reason for hiding this comment

Uh oh!

Boss2002n May 8, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

hellozhuo-amd commented May 6, 2026 •

edited

Loading