QVAC-19254 ggml-opencl: Adreno elementwise kernels (sin/cos/abs/elu/leaky_relu) for Chatterbox S3Gen#15
Closed
pratiknarola-t wants to merge 1 commit into
Closed
Conversation
…eaky_relu) for Chatterbox S3Gen Add 5 elementwise OpenCL kernels + dispatch/supports_op wiring so Chatterbox's S3Gen graph runs on the Adreno OpenCL backend: sin/cos (iSTFT), abs + elu (f0_predictor), leaky_relu (encoder/HiFT). Modeled on the existing neg/scale elementwise ops (f32 + f32_4 variants, bounds-checked). CONV_TRANSPOSE_1D is intentionally left unsupported so ggml_backend_sched routes it to CPU. P0: with a Q8_0 S3Gen this lets Chatterbox run end-to-end on OpenCL (Adreno 740); P1 output parity vs CPU is still WIP.
839fc3b to
2b3cc06
Compare
2b3cc06 to
347e87f
Compare
Author
|
Superseded by #17 — this PR was auto-closed when its branch was renamed to the correct ticket ( |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
QVAC-19254 — ggml-opencl: Adreno elementwise kernels for Chatterbox
Adds the 5 elementwise OpenCL kernels that Chatterbox's S3Gen / HiFiGAN vocoder needs but ggml-opencl was missing:
kernel_leaky_relu_f32/_f32_4kernel_sin_f32/_f32_4kernel_cos_f32/_f32_4kernel_abs_f32/_f32_4kernel_elu_f32/_f32_4Files:
src/ggml-opencl/kernels/{leaky_relu,sin,cos,abs,elu}.cl+ matchingclCreateKernel/supports_op/ dispatch wiring insrc/ggml-opencl/ggml-opencl.cpp+CMakeLists.txtentries.Adreno-optimized-kernel gate (M%4 / K%32)
Also adds a final
ne[0] % 32 == 0 && ne[1] % 4 == 0constraint touse_adreno_kernels(...). The Adreno-optimized transpose + GEMM/GEMV kernels assert these alignments in the weight-repack path; without the gate, unaligned weights (e.g. some S3Gen projections) hit aGGML_ASSERTinstead of falling back to the generic q4_0 path. The gate is shared by load-time repack + compute-time kernel selection so the two paths stay consistent.Verification
origin/speech): clean arm64-android build, no warnings on any of the 5 kernels or theuse_adreno_kernelschange.EXIT=0. The 5 new kernels are exercised through S3Gen's SineGen / STFT / CFM stages on Adreno.