Skip to content

QVAC-19254 ggml-opencl: Adreno elementwise kernels (sin/cos/abs/elu/leaky_relu) for Chatterbox S3Gen#15

Closed
pratiknarola-t wants to merge 1 commit into
speechfrom
QVAC-19213-adreno-opencl-kernels
Closed

QVAC-19254 ggml-opencl: Adreno elementwise kernels (sin/cos/abs/elu/leaky_relu) for Chatterbox S3Gen#15
pratiknarola-t wants to merge 1 commit into
speechfrom
QVAC-19213-adreno-opencl-kernels

Conversation

@pratiknarola-t

@pratiknarola-t pratiknarola-t commented May 28, 2026

Copy link
Copy Markdown

QVAC-19254 — ggml-opencl: Adreno elementwise kernels for Chatterbox

Adds the 5 elementwise OpenCL kernels that Chatterbox's S3Gen / HiFiGAN vocoder needs but ggml-opencl was missing:

Kernel Used by
kernel_leaky_relu_f32 / _f32_4 HiFiGAN vocoder
kernel_sin_f32 / _f32_4 SineGen + iSTFT
kernel_cos_f32 / _f32_4 iSTFT
kernel_abs_f32 / _f32_4 vocoder
kernel_elu_f32 / _f32_4 S3Gen + mel→wav vocoder

Files: src/ggml-opencl/kernels/{leaky_relu,sin,cos,abs,elu}.cl + matching clCreateKernel / supports_op / dispatch wiring in src/ggml-opencl/ggml-opencl.cpp + CMakeLists.txt entries.

Adreno-optimized-kernel gate (M%4 / K%32)

Also adds a final ne[0] % 32 == 0 && ne[1] % 4 == 0 constraint to use_adreno_kernels(...). The Adreno-optimized transpose + GEMM/GEMV kernels assert these alignments in the weight-repack path; without the gate, unaligned weights (e.g. some S3Gen projections) hit a GGML_ASSERT instead of falling back to the generic q4_0 path. The gate is shared by load-time repack + compute-time kernel selection so the two paths stay consistent.

Verification

  • Compile-verified against the just-synced ggml v0.10.2 (origin/speech): clean arm64-android build, no warnings on any of the 5 kernels or the use_adreno_kernels change.
  • On-device smoke (iQOO 11 / SD8 Gen 2 / Adreno 740): Chatterbox + Supertonic on OpenCL both EXIT=0. The 5 new kernels are exercised through S3Gen's SineGen / STFT / CFM stages on Adreno.

…eaky_relu) for Chatterbox S3Gen

Add 5 elementwise OpenCL kernels + dispatch/supports_op wiring so Chatterbox's
S3Gen graph runs on the Adreno OpenCL backend: sin/cos (iSTFT), abs + elu
(f0_predictor), leaky_relu (encoder/HiFT). Modeled on the existing neg/scale
elementwise ops (f32 + f32_4 variants, bounds-checked). CONV_TRANSPOSE_1D is
intentionally left unsupported so ggml_backend_sched routes it to CPU.

P0: with a Q8_0 S3Gen this lets Chatterbox run end-to-end on OpenCL (Adreno
740); P1 output parity vs CPU is still WIP.
@pratiknarola-t pratiknarola-t force-pushed the QVAC-19213-adreno-opencl-kernels branch from 839fc3b to 2b3cc06 Compare May 28, 2026 08:38
@pratiknarola-t pratiknarola-t changed the title QVAC-19213 ggml-opencl: Adreno elementwise kernels (sin/cos/abs/elu/leaky_relu) for Chatterbox S3Gen QVAC-19254 ggml-opencl: Adreno elementwise kernels (sin/cos/abs/elu/leaky_relu) for Chatterbox S3Gen May 28, 2026
@pratiknarola-t pratiknarola-t force-pushed the QVAC-19213-adreno-opencl-kernels branch from 2b3cc06 to 347e87f Compare June 1, 2026 06:24
@pratiknarola-t pratiknarola-t deleted the QVAC-19213-adreno-opencl-kernels branch June 1, 2026 06:27
@pratiknarola-t

Copy link
Copy Markdown
Author

Superseded by #17 — this PR was auto-closed when its branch was renamed to the correct ticket (QVAC-19213-adreno-opencl-kernelsQVAC-19254-adreno-opencl-kernels). Same signed commit, identical content.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant