Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
16 commits
Select commit Hold shift + click to select a range
ef840d5
Add tts-cpp files
GustavoA1604 May 6, 2026
fa0d490
tts-cpp: close the patches/-deleted regression (review #5+#6)
GustavoA1604 May 6, 2026
ae34c58
tts-cpp: README rewrite for integrated/vcpkg context (review #26)
GustavoA1604 May 7, 2026
a2f2dd6
docs: top-level README pointer at QVAC speech-stack subtrees (review …
GustavoA1604 May 7, 2026
4b5d2d7
tts-cpp: mirror review N1-N7 fixes from chatterbox.cpp
GustavoA1604 May 7, 2026
8ba10a6
tts-cpp: drop unreachable TTS_CPP_GGML_LIB_PREFIX block (review N8)
GustavoA1604 May 7, 2026
e673182
tts-cpp: scrub stale patches/ refs in README (review N10)
GustavoA1604 May 7, 2026
28ef67d
tts-cpp: mirror supertonic_gguf.cpp ggml-quants.h fix from chatterbox…
GustavoA1604 May 7, 2026
e8f6065
tts-cpp: mirror tts-cppConfig find_dependency(OpenMP) fix from chatte…
GustavoA1604 May 7, 2026
04b87ea
tts-cpp: mirror tts-cppConfig OpenMP COMPONENTS CXX scope from chatte…
GustavoA1604 May 7, 2026
64abb81
tts-cpp: mirror TTS_CPP_OPENMP gate from chatterbox.cpp
GustavoA1604 May 7, 2026
1963f9f
tts-cpp: mirror Engine::backend_device() public API from chatterbox.cpp
GustavoA1604 May 7, 2026
942686d
tts-cpp: mirror chatterbox synthesize_batch apply_trim_fade gate from…
GustavoA1604 May 7, 2026
761eca0
parakeet-cpp: surface SentencePiece word-start signal on StreamingSeg…
GustavoA1604 May 7, 2026
db87f42
tts-cpp: chatterbox MTL run_t3 wraps text tokens with start/stop_text…
GustavoA1604 May 7, 2026
0b44674
tts-cpp: port chatterbox_cli.cpp 3-identical-token early-stop into En…
GustavoA1604 May 7, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 18 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,24 @@ On Apple Silicon, the inference runs fully on the GPU via Metal:

https://github.com/ggml-org/whisper.cpp/assets/1991296/c82e8f86-60dc-49f2-b048-d2fdbd6b5225

## QVAC speech-stack ports

This fork carries two in-tree subtrees alongside the upstream whisper.cpp
sources:

- [`tts-cpp/`](tts-cpp/) — text-to-speech via Resemble Chatterbox (Turbo +
Multilingual) and Supertonic. In-tree subtree of
[github.com/gianni-cor/chatterbox.cpp](https://github.com/gianni-cor/chatterbox.cpp);
consumes ggml from the [`qvac-ext-ggml/speech`](https://github.com/tetherto/qvac-ext-ggml/tree/speech)
branch via the `ggml-speech` vcpkg port.
- [`parakeet-cpp/`](parakeet-cpp/) — automatic speech recognition (NVIDIA
Parakeet FastConformer family — CTC, TDT, EOU, Sortformer) and
speaker diarization. In-tree subtree of the standalone parakeet.cpp
repo; same `ggml-speech` consumption pattern.

Each subtree has its own README / build flow / public C++ API. The
upstream whisper.cpp build below is unaffected by either.

## Quick start

First clone the repository:
Expand Down
15 changes: 15 additions & 0 deletions parakeet-cpp/include/parakeet/streaming.h
Original file line number Diff line number Diff line change
Expand Up @@ -94,6 +94,21 @@ struct StreamingSegment {
int chunk_index = 0;
bool is_final = true;

// True when this segment's first token is a SentencePiece word-start
// (the piece begins with the `▁` U+2581 marker), false when it is a
// wordpiece continuation of the previous segment's last token.
//
// Streaming consumers building a running transcript should insert a
// separator (e.g. " ") between successive segments only when the
// *new* segment has `starts_word == true`. Concatenating verbatim
// when `starts_word == false` joins the splits like
// ["pun", "ctuation"] back into "punctuation"; inserting a space
// there would yield "pun ctuation" instead.
//
// Always true on the very first segment of a session and on any
// segment whose token list is empty (defensive default).
bool starts_word = true;

// EOU-only: true when this segment ends on `<EOU>`. For CTC/TDT use StreamEvent
// EndOfTurn via `on_event` instead; those engines leave this flag false here.
bool is_eou_boundary = false;
Expand Down
6 changes: 6 additions & 0 deletions parakeet-cpp/src/parakeet_engine.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -473,6 +473,9 @@ EngineResult Engine::transcribe_samples_stream(const float * samples,
seg.end_s = seg_end_s;
seg.chunk_index = chunk_index;
seg.is_final = true;
seg.starts_word = win_tokens.empty()
? true
: token_is_word_start(pimpl_->model.vocab, win_tokens.front());
seg.is_eou_boundary = eou_boundaries_in_chunk > 0;
seg.encoder_ms = first_segment ? encoder_ms : 0.0;
seg.decode_ms = win_decode_ms;
Expand Down Expand Up @@ -863,6 +866,9 @@ void StreamSession::Impl::process_window(const float * window_samples, int windo
seg.end_s = chunk_end_s;
seg.chunk_index = chunk_index;
seg.is_final = true;
seg.starts_word = win_tokens.empty()
? true
: token_is_word_start(engine_impl->model.vocab, win_tokens.front());
seg.is_eou_boundary = eou_boundaries_in_chunk > 0;
seg.encoder_ms = encoder_ms;
seg.decode_ms = decode_ms;
Expand Down
12 changes: 12 additions & 0 deletions parakeet-cpp/src/sentencepiece_bpe.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -36,4 +36,16 @@ std::string detokenize(const BpeVocab & vocab,
return out;
}

bool token_is_word_start(const BpeVocab & vocab, int32_t token_id) {
if (token_id < 0 || token_id >= static_cast<int32_t>(vocab.pieces.size())) return false;
if (token_id == vocab.blank_id || token_id == vocab.bos_id ||
token_id == vocab.eos_id || token_id == vocab.pad_id) return false;
const std::string & piece = vocab.pieces[token_id];
if (piece.size() < 3) return false;
const unsigned char c0 = static_cast<unsigned char>(piece[0]);
const unsigned char c1 = static_cast<unsigned char>(piece[1]);
const unsigned char c2 = static_cast<unsigned char>(piece[2]);
return c0 == 0xE2 && c1 == 0x96 && c2 == 0x81;
}

}
14 changes: 14 additions & 0 deletions parakeet-cpp/src/sentencepiece_bpe.h
Original file line number Diff line number Diff line change
Expand Up @@ -25,4 +25,18 @@ struct BpeVocab {
std::string detokenize(const BpeVocab & vocab,
const std::vector<int32_t> & token_ids);

// True when the token's piece (in `vocab.pieces`) begins with the
// SentencePiece word-boundary marker `▁` (U+2581, encoded as the 3-byte
// sequence 0xE2 0x96 0x81 in UTF-8). Used by the streaming sessions to
// stamp `StreamingSegment::starts_word` so consumers can distinguish a
// chunk-boundary wordpiece continuation ("ctuation" after "pun") from
// a fresh word ("if" after "see") without re-implementing the BPE
// detokenizer rules.
//
// Returns false for out-of-range, blank/bos/eos/pad ids, and pieces
// whose first byte does not start the U+2581 marker (e.g. punctuation
// pieces like ",", "." that should still be glued onto the previous
// word without an inserted space).
bool token_is_word_start(const BpeVocab & vocab, int32_t token_id);

}
40 changes: 40 additions & 0 deletions tts-cpp/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
# Vendored ggml (cloned separately at setup time; see README)
ggml/

# Build artifacts
build/
build-*/
build_tts_cli/
*.o
*.obj

# Python
__pycache__/
*.pyc
.venv/
venv/

# Model files (too big for git)
models/
*.gguf

# Reference dumps
artifacts/

# Local tokenizer files (downloaded from HF cache)
tokenizer/

# Voice-clone profiles (generated per reference audio)
voices/

# Generated audio
*.wav

# Editor files
.vscode/
.idea/
*.swp
.DS_Store
._*
.cache/
compile_commands.json
Loading
Loading