refactor(triton): reorganize conv modules and unify gated FP8 quant path#3048
refactor(triton): reorganize conv modules and unify gated FP8 quant path#3048hellozhuo-amd wants to merge 8 commits intomainfrom
Conversation
Move causal conv1d Triton code into dedicated conv subpackages by relocating kernels and wrappers under aiter.ops.triton._triton_kernels.conv and aiter.ops.triton.conv, and move related Triton tests into op_tests/triton_tests/conv for consistent structure. Integrate gated RMS+FP8 quantization as an additional feature path in the existing fused FP8 kernel flow instead of maintaining a separate standalone gated kernel wrapper path. Keep both gated and classic test coverage in the test commands and update split-tests mapping to the new conv test path.
🏷️ CI GuideRuns automatically on every PR:
Extended tests (opt-in via labels):
|
There was a problem hiding this comment.
Pull request overview
This PR reorganizes the Triton causal conv1d implementation into dedicated conv/ subpackages and folds the previously separate gated RMSNorm+FP8 quantization path into the existing fused FP8 kernel flow.
Changes:
- Relocate causal conv1d wrappers/kernels under
aiter.ops.triton.convandaiter.ops.triton._triton_kernels.conv, updating tests and split-test mapping accordingly. - Unify gated RMSNorm+FP8 group quantization by adding a gated mode to
_fused_rms_fp8_group_quant_kerneland routing the gated wrapper through it. - Minor repo hygiene updates (ignore vim swap files; normalize
.ipynb_checkpointsignore pattern).
Reviewed changes
Copilot reviewed 9 out of 12 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| op_tests/triton_tests/conv/test_causal_conv1d.py | Update imports to new aiter.ops.triton.conv / kernel conv subpackage paths. |
| op_tests/triton_tests/conv/test_causal_conv1d_update_single_token.py | Update imports to new conv subpackage paths. |
| aiter/ops/triton/quant/fused_fp8_quant.py | Route gated RMS+FP8 quant through unified fused kernel path; minor compatibility tweaks for fp8 dtype bounds + heuristics. |
| aiter/ops/triton/conv/causal_conv1d.py | Update kernel import path to _triton_kernels.conv. |
| aiter/ops/triton/conv/causal_conv1d_update_single_token.py | Update kernel import paths to _triton_kernels.conv. |
| aiter/ops/triton/conv/init.py | New conv package exports for causal conv1d APIs. |
| aiter/ops/triton/_triton_kernels/quant/fused_fp8_quant.py | Extend fused RMS+FP8 group quant kernel to support gated mode; remove standalone gated kernel. |
| aiter/ops/triton/_triton_kernels/conv/causal_conv1d.py | New conv kernel module location for causal conv1d fwd/update kernels. |
| aiter/ops/triton/_triton_kernels/conv/causal_conv1d_update_single_token.py | New conv kernel module location for single-token update kernels. |
| aiter/ops/triton/_triton_kernels/conv/init.py | New conv kernel subpackage init/export. |
| .gitignore | Add vim swap ignores; fix .ipynb_checkpoints glob indentation. |
| .github/scripts/split_tests.sh | Update split-test mapping to new conv test path. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Agent-Logs-Url: https://github.com/ROCm/aiter/sessions/68e13a64-8717-4b09-b862-c8bb8f4eb642 Co-authored-by: hellozhuo-amd <225919697+hellozhuo-amd@users.noreply.github.com>
…d FP8 launch Agent-Logs-Url: https://github.com/ROCm/aiter/sessions/85686171-3024-472f-818f-6ed9d52ee761 Co-authored-by: hellozhuo-amd <225919697+hellozhuo-amd@users.noreply.github.com>
- Map causal_conv1d and causal_conv1d_update_single_token in _BACKWARD_COMPAT_MAP for legacy flat imports (e.g. vLLM). - Import conv APIs in op_tests via flat aiter.ops.triton.* paths. - Silence ruff F401 on intentional comms re-exports in triton __init__. Co-authored-by: Cursor <cursoragent@cursor.com>
There was a problem hiding this comment.
can we keep this file empty
There was a problem hiding this comment.
we can empty this file too, unless theres a reason as to why we need this.
It will be consistent with the rest of the repo if we empty it :)
| x, | ||
| weight, | ||
| bias, | ||
| z, | ||
| dummy, | ||
| dummy, | ||
| dummy, | ||
| x_quant, | ||
| scales, | ||
| dummy, | ||
| dummy, | ||
| dummy, | ||
| eps, | ||
| 0.0, | ||
| M, | ||
| N, | ||
| 0, | ||
| x.stride(0), | ||
| z.stride(0), | ||
| 1, | ||
| x.stride(1), | ||
| 1, | ||
| 1, | ||
| 1, | ||
| x_quant.stride(0), | ||
| x_quant.stride(1), | ||
| stride_s_row, | ||
| stride_s_g, | ||
| M, | ||
| N, | ||
| eps, | ||
| 1, | ||
| 1, | ||
| 1, | ||
| 1, | ||
| 1, | ||
| 1, | ||
| z, | ||
| bias_ptr, | ||
| z.stride(0), |
There was a problem hiding this comment.
can we not do positional args for this, keyword args make it easy to read, it is very hard to follow given we have so many args being passed
There was a problem hiding this comment.
I am wondering if we can split these into 2 different kernels instead of having an if-else.
would help making it easier to read
Summary
This PR reorganizes Triton conv1d code into dedicated
convmodules and folds gated RMS+FP8 quantization into the existing fused FP8 kernel path.Related comments: #3005 (comment)
What changed
Conv module reorganization
aiter/ops/triton/_triton_kernels/conv/aiter/ops/triton/conv/__init__.pyexports for the new conv packages.FP8 quant integration
GATED_RMS_FP8=False), andGATED_RMS_FP8=True),in the same fused kernel flow.
ROWS_PER_BLOCK(calc_rows_per_block) for gated execution.Test layout + script updates
op_tests/triton_tests/conv/test_causal_conv1d.pyop_tests/triton_tests/conv/test_causal_conv1d_update_single_token.py.github/scripts/split_tests.shMotivation
attention,quant, etc.) for easier discovery and maintenance.Technical Details
Test Plan
Ran:
Additional validation:
black --checkandruff checkon touched Triton conv/quant files.Test Result
From the vllm side, the speed of the gated fp8 kernel was maintained.
Notes
convmodule paths.Submission Checklist