Fixing PR Workflow by kapildev421 · Pull Request #3 · tetherto/qvac-ext-lib-whisper.cpp

kapildev421 · 2025-09-10T15:14:23Z

This PR fixes the workflow for Tier-based Approvals

kapildev421 · 2025-09-17T06:41:03Z

/review

… / vector graph caches QVAC-18607 follow-up tetherto#3. Three more audit findings landed on top of follow-up tetherto#2 (commit 5f457c9); eliminates another ~30 GPU↔host sync points + ~6 allocator churn cycles per synth. F17 Duration scalar-continuation `read_f32` cache. Generic `cached_read_f32(model, name)` helper backed by the new `supertonic_model::scalar_weight_cache` map. Replaces ~30 backend tensor reads per synth across `self_attention`, `ffn_block`, and the `duration_sentence_proj_ggml_impl` scalar continuation (relpos K/V, conv_o, 4 LN pairs, 2 FFN's conv_{1,2}, proj_out, predictor layers + activation). Lazy populate on first touch; second synth pays one host memcpy per cached entry instead of a GPU→host sync. F18 Text-encoder convnext-front graph cached across synths. `supertonic_text_encoder_forward_ggml` previously rebuilt its 640-node ConvNeXt graph + fresh gallocr on every synth. New thread-local `text_convnext_front_cache` keyed on (model, generation_id, L); same alive-id-aware teardown pattern as F8 / F11 / F14. F19 Vector-estimator front-block graph cached across denoise steps. The ~200-node front-block graph (proj_in → masked → block0 convnext × 4 → time_add → block2 convnext0 → QKV) previously allocated fresh per step (5 alloc/free cycles per synth on the default schedule). Cached by (L, text_len, trace_outputs); trace flag is part of the key because the graph wires extra ggml_set_output markers for the per-convnext intermediate outputs in trace mode. New TDD harness (fixture-bound): test-supertonic-audit3-caches (279 lines) - F17: structural — asserts the scalar_weight_cache map contains the expected entries after the first duration call and does NOT grow on the second; duration scalar is bit- exact across the two calls. - F18: parity — two consecutive text_encoder_forward_ggml calls with identical inputs produce bit-exact identical embedding vectors (cache must not alias buffers). - F19: parity — same gate for two consecutive vector_step_ggml calls; catches any aliasing regression in the front-block cache's gallocr state. Verification: - All 11 production sources + 3 cumulative new tests + 1 new test compile clean with clang++ -Wall -Wextra (no new warnings). - Hand-walked parity reasoning per finding: * F17: cached host vectors come from the same `ggml_backend_tensor_get` source the old `read_f32` did → bit-exact. * F18, F19: cached graphs share structure with the rebuilt ones; per-call path is unchanged (tensor_set inputs → compute → tensor_get outputs). Bit-exact across calls. - Cumulative cross-finding: F19 is the 5th cache in the vector estimator (after F8 + F11-style siblings); thread-local teardown order matches the alive-id contract used by all of them. Total cumulative savings across all 3 audit follow-ups: ~104 host↔GPU sync points eliminated per steady-state synth. Diff: 6 sources changed, 1 new test, 1 CMakeLists update. +327 / -172 in src/ + CMakeLists + internal header. +279 new test. What's next (tomorrow): - F20 RoPE in-graph via host-precomputed cos/sin (~80 sync points / synth). Needs device parity gate. - Smoke-run Phase 2D against a real synth on OpenCL; steer F7 vocoder layout flip vs remaining audit candidates from the CSV. Co-authored-by: Cursor <cursoragent@cursor.com>

Resolves 37 add/add conflicts that accumulated since the last master merge (May 7). Master moved 326 commits forward, mainly landing parakeet-cpp (TDT/EOU/Sortformer/AOSC), the ggml-backend registry refactor (`backend_selection.{h,cpp}`, registry-only device walk replacing the per-backend `#ifdef GGML_USE_<X>` cascades), Android `GGML_BACKEND_DL=ON` plumbing, and the `backends_dir` / `opencl_cache_dir` Engine knobs. Resolution strategy: - parakeet-cpp/ (19 files): taken from master verbatim. The PR branch only carried the original port (commits d7ab516 / c6c3fd7 / 761eca0, all <= May 7); master has 13 newer commits including TDT/EOU/Sortformer v2.1 + AOSC and the word-start signal already integrated. Nothing of the PR was lost on this side. - .github/CODEOWNERS: taken from master (team reorg to `qvac-internal-dev` / `qvac-internal-merge`). - tts-cpp/ stale-from-initial-drop (7 files: voice_encoder, t3_mtl, s3tokenizer, mel_extract_stft, main, campplus, campplus_forward.inc): taken from master. Their only PR commit is the original `ef840d5c Add tts-cpp files` drop; master has since rewritten them for the registry refactor. - tts-cpp/ mirror-only (4 files: supertonic/engine.h, supertonic_engine, supertonic_gguf, chatterbox_tts): taken from master. The PR's only authored commits on these mirror pre-existing fixes from chatterbox.cpp that are already on master. - tts-cpp/CMakeLists.txt: hybrid merge. Master's Android dynamic-backend stack, registry-only backend-defs interface (with `src/backend_selection.cpp` in the source list), and `target_compile_definitions(test-metal-ops PRIVATE GGML_USE_METAL)` retained. PR's `src/text_preprocess.cpp` source entry, MeCab/Cangjie find_library block (PRIVATE include per gianni-cor review), and 23-language multilingual test matrix retained. - tts-cpp/include/tts-cpp/chatterbox/engine.h: master's updated `n_gpu_layers` doc (Adreno-tier policy) and new `backends_dir` / `opencl_cache_dir` fields retained. PR's `mecab_dict_path` / `cangjie_tsv_path` fields retained. - tts-cpp/src/mtl_tokenizer.{cpp,h}: PR's `<mutex>` + `text_preprocess.h` includes, 23 supported_languages, preprocess_japanese / preprocess_chinese helpers with call_once-cached MeCab tagger + Cangjie table, apply_language_preprocessing dispatch, and `set_mecab_dict_path` / `set_cangjie_tsv_path` setters (with already-initialised warn) retained. Master's `// ---- Encode ----` divider kept. - tts-cpp/src/chatterbox_engine.cpp: master's `#include "backend_selection.h"` and `backends_dir` / `opencl_cache_dir` wiring retained. PR's per-Engine `mtl_tokenizer::set_mecab_dict_path` / `set_cangjie_tsv_path` calls retained. - tts-cpp/src/chatterbox_cli.cpp: master's removal of the per-backend `#include "ggml-{cuda,metal,vulkan}.h"` cascade (registry-only refactor) and the new voice-cloning backend comment retained. PR's `--mecab-dict` / `--cangjie-tsv` flags (declaration, help, parsing, and per-Engine setter call) retained. PR's RAII `thread_join_guard` on the s3gen preload thread retained (addresses GustavoA1604 review #3: std::terminate hazard during stack unwind). PR's 2-token MTL early-stop with `kMtlMinTokensBeforeCadence = 60` guard and `generated.resize(n - 1)` retained (addresses GustavoA1604 review #2: previous over-aggressive `resize(n - 2)` trimmed a legitimate token); the log line was updated to surface the repeated token id. PR-only files (no conflict): tts-cpp/src/text_preprocess.{h,cpp}, tts-cpp/scripts/build_mecab_dict.py, tts-cpp/scripts/build_cangjie_tsv.py, tts-cpp/test/test_multilingual_{synth,asr}.cpp are all preserved as-is by the merge. Co-authored-by: Cursor <cursoragent@cursor.com>

Fixing PR Workflow

43d4b39

kapildev421 mentioned this pull request Sep 10, 2025

Add approval-check-worker workflow #2

Closed

olyasir approved these changes Sep 16, 2025

View reviewed changes

kartiksain approved these changes Sep 17, 2025

View reviewed changes

GustavoA1604 closed this Nov 17, 2025

ogad-tether mentioned this pull request May 11, 2026

tts-cpp: Supertonic ggml Metal — full B2 + B1 f16 + causal kernel (88 ms, 36× real-time, fastest on every stage) #15

Merged

9 tasks

Zbig9000 mentioned this pull request May 12, 2026

Qvac 18607 tts ggml add and optimize open cl for supertonic #16

Merged

5 tasks

GustavoA1604 mentioned this pull request May 15, 2026

feat(tts-cpp): add 23-language multilingual support with runtime MeCab/Cangjie paths #19

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixing PR Workflow#3

Fixing PR Workflow#3
kapildev421 wants to merge 1 commit into
tetherto:masterfrom
kapildev421:master

kapildev421 commented Sep 10, 2025

Uh oh!

kapildev421 commented Sep 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

kapildev421 commented Sep 10, 2025

Uh oh!

kapildev421 commented Sep 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants