Skip to content

supertonic: fused Metal kernels + layout-flexible activations#8

Merged
GustavoA1604 merged 1 commit into
speechfrom
feat/supertonic-ops
May 12, 2026
Merged

supertonic: fused Metal kernels + layout-flexible activations#8
GustavoA1604 merged 1 commit into
speechfrom
feat/supertonic-ops

Conversation

@ogad-tether

@ogad-tether ogad-tether commented May 12, 2026

Copy link
Copy Markdown

Summary

Lands the Supertonic 2 custom op family onto qvac-ext-ggml@speech so the
tts-cpp Supertonic Metal path
can consume them via the upstream ggml-speech vcpkg port without
carrying a local overlay-port patch in qvac-ext-lib-whisper.cpp.

What's added

Five fused Metal kernels (each with a CPU forward as the parity backstop)
collapse multi-op sub-graphs that the Supertonic graph builders emit per
ConvNeXt / attention block. Every kernel uses a stride-parameterised
body so the same compiled kernel handles both [T, C] and [C, T]
activations via a layout flag in op_params:

Op Replaces
SUPERTONIC_DEPTHWISE_1D edge-clamp/causal pad + im2col + mul_mat + add (K ∈ {3, 5, 7}; symmetric or causal-left padding)
SUPERTONIC_LAYER_NORM_CHANNEL permute + cont + norm + mul + add + permute + cont
SUPERTONIC_PW2_RESIDUAL add(bias) + mul(gamma) + add(residual)
SUPERTONIC_BIAS_GELU add(bias) + gelu_erf
SUPERTONIC_EDGE_PAD_1D edge-replicate causal/symmetric pad

Public ctors in include/ggml.h:

  • ggml_supertonic_depthwise_1d{,_ct,_causal_ct}[T,C] symmetric / [C,T] symmetric / [C,T] causal
  • ggml_supertonic_layer_norm_channel{,_ct}
  • ggml_supertonic_pw2_residual{,_ct}
  • ggml_supertonic_bias_gelu{,_ct}
  • ggml_supertonic_edge_pad_1d{,_ct}

GGML_OP_COUNT bumps 96 → 101.

Provenance

Authored against ggml-org/ggml master commit a8db410a, rebased onto
qvac-ext-ggml@speech HEAD 91676f0. Patches applied cleanly modulo
a few additive-only context shifts (no semantic conflicts).

Test plan

End-to-end verification on Apple M2:

  • Build qvac-ext-ggml feat/supertonic-ops via vcpkg with Metal feature ON
  • Build qvac-ext-lib-whisper.cpp PR #15 tts-cpp against this branch (overlay-port redirected via vcpkg_from_git file://)
  • Synthesize through supertonic-cli — writes valid 2.6 s WAV at 44.1 kHz
  • Run supertonic-bench — vocoder 14 ms / vec_est 71 ms / total 108 ms median (all _ct paths engaged including the causal depthwise on the 10-block vocoder ConvNeXt chain)
  • Reviewer to confirm no regression on non-Supertonic ggml consumers (all additions are net-new; no existing code paths modified)

Downstream

Once this lands on speech, qvac-ext-lib-whisper.cpp PR #15's
overlay-port can be deleted entirely and tts-cpp consumes patched ggml
directly from the ggml-speech vcpkg port.

🤖 Generated with Claude Code


Lands the supertonic custom op family onto qvac-ext-ggml@speech so the
tts-cpp Supertonic 2 path can consume them via the upstream ggml-speech
port without a local overlay-port patch in qvac-ext-lib-whisper.cpp.

Five fused Metal kernels (each with a CPU forward as the parity
backstop) collapse multi-op sub-graphs that the Supertonic graph
builders emit per ConvNeXt / attention block, plus a stride-
parameterised body so a single compiled kernel handles both
[T, C] and [C, T] activations via a layout flag in op_params:

  - SUPERTONIC_DEPTHWISE_1D       (depthwise K∈{3,5,7}; layout flag;
                                   symmetric edge-clamp + causal-left modes)
  - SUPERTONIC_LAYER_NORM_CHANNEL (fuses permute+norm+mul+add+permute)
  - SUPERTONIC_PW2_RESIDUAL       (fuses add(bias)+mul(gamma)+add(residual))
  - SUPERTONIC_BIAS_GELU          (fuses add(bias)+gelu_erf)
  - SUPERTONIC_EDGE_PAD_1D        (edge-replicate causal/symmetric pad)

Public ctors in ggml.h:

  - ggml_supertonic_depthwise_1d           (default [T,C], symmetric)
  - ggml_supertonic_depthwise_1d_ct        ([C,T], symmetric)
  - ggml_supertonic_depthwise_1d_causal_ct ([C,T], causal-left)
  - ggml_supertonic_layer_norm_channel{,_ct}
  - ggml_supertonic_pw2_residual{,_ct}
  - ggml_supertonic_bias_gelu{,_ct}
  - ggml_supertonic_edge_pad_1d{,_ct}

GGML_OP_COUNT bumps 96 → 101.

Authored against ggml-org/ggml master commit a8db410; rebased onto
qvac-ext-ggml@speech HEAD 91676f0 (a few additive-only context shifts).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants