supertonic: fused Metal kernels + layout-flexible activations#8
Merged
Conversation
Lands the supertonic custom op family onto qvac-ext-ggml@speech so the
tts-cpp Supertonic 2 path can consume them via the upstream ggml-speech
port without a local overlay-port patch in qvac-ext-lib-whisper.cpp.
Five fused Metal kernels (each with a CPU forward as the parity
backstop) collapse multi-op sub-graphs that the Supertonic graph
builders emit per ConvNeXt / attention block, plus a stride-
parameterised body so a single compiled kernel handles both
[T, C] and [C, T] activations via a layout flag in op_params:
- SUPERTONIC_DEPTHWISE_1D (depthwise K∈{3,5,7}; layout flag;
symmetric edge-clamp + causal-left modes)
- SUPERTONIC_LAYER_NORM_CHANNEL (fuses permute+norm+mul+add+permute)
- SUPERTONIC_PW2_RESIDUAL (fuses add(bias)+mul(gamma)+add(residual))
- SUPERTONIC_BIAS_GELU (fuses add(bias)+gelu_erf)
- SUPERTONIC_EDGE_PAD_1D (edge-replicate causal/symmetric pad)
Public ctors in ggml.h:
- ggml_supertonic_depthwise_1d (default [T,C], symmetric)
- ggml_supertonic_depthwise_1d_ct ([C,T], symmetric)
- ggml_supertonic_depthwise_1d_causal_ct ([C,T], causal-left)
- ggml_supertonic_layer_norm_channel{,_ct}
- ggml_supertonic_pw2_residual{,_ct}
- ggml_supertonic_bias_gelu{,_ct}
- ggml_supertonic_edge_pad_1d{,_ct}
GGML_OP_COUNT bumps 96 → 101.
Authored against ggml-org/ggml master commit a8db410; rebased onto
qvac-ext-ggml@speech HEAD 91676f0 (a few additive-only context shifts).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Lands the Supertonic 2 custom op family onto
qvac-ext-ggml@speechso thetts-cpp Supertonic Metal path
can consume them via the upstream
ggml-speechvcpkg port withoutcarrying a local overlay-port patch in
qvac-ext-lib-whisper.cpp.What's added
Five fused Metal kernels (each with a CPU forward as the parity backstop)
collapse multi-op sub-graphs that the Supertonic graph builders emit per
ConvNeXt / attention block. Every kernel uses a stride-parameterised
body so the same compiled kernel handles both
[T, C]and[C, T]activations via a layout flag in
op_params:SUPERTONIC_DEPTHWISE_1DSUPERTONIC_LAYER_NORM_CHANNELSUPERTONIC_PW2_RESIDUALSUPERTONIC_BIAS_GELUSUPERTONIC_EDGE_PAD_1DPublic ctors in
include/ggml.h:ggml_supertonic_depthwise_1d{,_ct,_causal_ct}—[T,C]symmetric /[C,T]symmetric /[C,T]causalggml_supertonic_layer_norm_channel{,_ct}ggml_supertonic_pw2_residual{,_ct}ggml_supertonic_bias_gelu{,_ct}ggml_supertonic_edge_pad_1d{,_ct}GGML_OP_COUNTbumps 96 → 101.Provenance
Authored against
ggml-org/ggmlmaster commita8db410a, rebased ontoqvac-ext-ggml@speechHEAD91676f0. Patches applied cleanly moduloa few additive-only context shifts (no semantic conflicts).
Test plan
End-to-end verification on Apple M2:
feat/supertonic-opsvia vcpkg with Metal feature ONvcpkg_from_git file://)vocoder 14 ms / vec_est 71 ms / total 108 msmedian (all_ctpaths engaged including the causal depthwise on the 10-block vocoder ConvNeXt chain)Downstream
Once this lands on
speech, qvac-ext-lib-whisper.cpp PR #15'soverlay-port can be deleted entirely and tts-cpp consumes patched ggml
directly from the
ggml-speechvcpkg port.🤖 Generated with Claude Code