[BCI] QVAC-17071 feat: add BCI neural signal support (variable conv1 kernel + windowed attention) by sharmaraju352 · Pull Request #10 · tetherto/qvac-ext-lib-whisper.cpp

sharmaraju352 · 2026-04-16T10:22:35Z

Summary

Adds two changes to whisper.cpp to support brain-computer interface (BCI) neural signal transcription. Based on v1.8.4.1.

1. Variable conv1 kernel size

Reads n_audio_conv1_kernel from model hparams (defaults to 3 for standard whisper models)
Allows BCI models to use a different first convolution kernel size

2. Windowed self-attention for encoder layers

Adds n_audio_window_size and n_audio_last_window_layer hparams
When present, encoder self-attention is restricted to a local window for layers up to last_window_layer
When windowed attention is active, the encoder bypasses flash attention and uses the standard softmax path (Metal flash attention does not support custom F32 masks)
Flash attention remains enabled for non-BCI models and for the decoder
Adds proper SOS token (language + transcribe) initialization for BCI models

Backward compatibility

Both changes are backward-compatible:

n_audio_conv1_kernel defaults to 3 (standard whisper behavior)
n_audio_window_size defaults to 0 and n_audio_last_window_layer defaults to -1, which disables windowed attention entirely
Standard whisper models are completely unaffected

Context

Required by the new @qvac/bci-whispercpp addon: tetherto/qvac#1583

Test plan

Standard whisper transcription still works (no regression)
BCI model loads and transcribes neural signals correctly on v1.8.4.1
Verified locally: 10.4% average WER across 5 BCI test samples (identical to v1.7.6 baseline)

Test results

BCI package (`@qvac/bci-whispercpp`)

4/4 integration tests pass (9/9 assertions)
10.4% average WER across 5 neural signal samples — identical output to v1.7.6 baseline

Standard whisper package (`@qvac/transcription-whispercpp`)

Integration: 10/10 tests pass (40/40 assertions)
Unit: 13/13 tests pass (57/57 assertions)
No regression in standard audio transcription (s16le + f32le formats verified)

Read n_audio_conv1_kernel from model hparams to allow BCI models to use a non-standard first convolution kernel size. Standard whisper models default to kernel size 3. Made-with: Cursor

- Add n_audio_window_size and n_audio_last_window_layer hparams - When present, encoder self-attention is restricted to a local window for layers up to last_window_layer - Bypass flash attention when windowed mask is active (Metal FA does not support custom F32 masks); flash attention remains enabled for non-BCI models and for the decoder - Populate window_mask data on the encoder graph (not the cross graph) - Add proper SOS token (language + transcribe) initialization for BCI models Backward-compatible: n_audio_window_size defaults to 0 and n_audio_last_window_layer defaults to -1, disabling windowed attention entirely for standard whisper models. Made-with: Cursor

Made-with: Cursor

Address review feedback: 1. Guard read_safe for BCI-specific hparams (n_audio_conv1_kernel, n_audio_window_size, n_audio_last_window_layer) behind a n_mels > 256 check. Standard whisper models have n_mels <= 128 and do not contain these fields — reading them unconditionally would corrupt the file position and break model loading. 2. Add explicit is_bci flag to hparams struct, set when BCI fields are detected during loading. 3. Use is_bci flag (instead of n_audio_window_size > 0) to guard the BCI-specific decoder SOS token initialization. 4. Log BCI-specific hparams when a BCI model is detected. Made-with: Cursor

The windowed attention mask values depend only on n_ctx and window_size, both fixed after model load. Move the O(n_ctx^2) computation from whisper_encode_internal (called every encode) to whisper_init_state (called once). The encode path now just copies the precomputed data to the graph tensor. Made-with: Cursor

ogad-tether

Thanks for addressing the earlier review notes. I found three follow-ups that still look blocking before this is safe to merge:

window_mask_data is precomputed for the full model n_audio_ctx, but the encoder graph uses exp_n_audio_ctx when callers set params.audio_ctx. That makes the later ggml_backend_tensor_set() size mismatch and can overflow the smaller graph tensor.
The install/include path change to include/whisper was not carried through to whisper.pc, which still emits -I${prefix}/include, so installed pkg-config consumers will stop finding whisper.h.
The exported CMake target still links Threads::Threads, but whisper-config.cmake no longer resolves that dependency on Windows, so find_package(whisper) can import a target with an undefined dependency there.

Could you follow up on those before merging?

…, Threads 1. Fix window_mask_data / exp_n_audio_ctx mismatch: the precomputed mask uses hparams.n_audio_ctx, but the graph tensor is sized from exp_n_audio_ctx when params.audio_ctx is overridden. Now falls back to recomputing the mask at the effective n_ctx when sizes differ, preventing a buffer overflow into the smaller tensor. 2. Update whisper.pc.in: the install interface was changed to include/whisper but the pkg-config includedir still pointed to include/. Consumers using pkg-config would not find whisper.h. 3. Fix whisper-config.cmake.in: the whisper target publicly links Threads::Threads but find_dependency(Threads) was skipped on Windows, leaving downstream find_package(whisper) with an unresolved imported target. Now always resolve Threads.

…ash attention 1. Cache fallback mask recompute: when exp_n_audio_ctx overrides the default n_audio_ctx, the window mask is now recomputed once and cached in wstate (keyed on window_mask_n_ctx) instead of allocating a new std::vector on every whisper_encode_internal call. 2. Per-layer flash attention: layers above last_window_layer no longer need the windowed attention mask. The flash attention path is now used for those layers even when BCI windowed attention is active, instead of globally falling back to the softmax path for the entire encoder. 3. Use std::abs instead of C abs in both init-time and encode-time mask computation paths.

…alidation 1. Extract compute_window_mask() helper on whisper_state to eliminate the duplicated O(n_ctx^2) mask fill loop that appeared in both whisper_init_state and whisper_encode_internal. Both call sites now use the single helper, preventing future drift. 2. Guard the encode-time mask block with hparams.is_bci before doing the ggml_graph_get_tensor lookup. Cheaper and more explicit than relying on the tensor name string to determine whether BCI windowed attention is active. 3. Add hparams.is_bci to the graph builder guard for window_mask tensor creation, aligning it with the other BCI code paths. 4. Add validation for BCI hparams after reading from file: n_audio_conv1_kernel must be > 0, n_audio_window_size must be >= 0. Log an error and return false on invalid values instead of proceeding with garbage. 5. Add comment explaining the n_mels > 256 threshold used to discriminate BCI models from standard whisper models, and noting that a dedicated file-format marker should be introduced if this assumption ever breaks. Made-with: Cursor

… + text-encoder GPU bridge + pinned-host-buffer per-step inputs Three independent wins bundled into one round, strict TDD on each — new CPU-only unit test for every change, RED → impl → GREEN → end-to-end validation on real hardware. == tetherto#10 — Auto-pick UMA bias == Round 3's `argmax(free_vram)` picks UMA iGPUs on hybrid rigs because UMA reports the entire system RAM (120+ GB) as free VRAM, while a discrete RTX 5090 reports 32 GB. Silent 40x realtime regression for any operator following the help text "auto-pick adapter with most free VRAM". Extended `resolve_vulkan_device_index` with an optional third arg: int resolve_vulkan_device_index(int requested, const std::vector<size_t> & free_vram_per_device, const std::vector<bool> & is_uma_per_device = {}); Empty UMA list -> round-3 behaviour preserved verbatim. Non-empty + at least one discrete -> argmax over the DISCRETE subset. All-UMA falls back to round-3 argmax. Explicit `requested >= 0` passthrough is UMA-agnostic. Caller wiring (in `init_supertonic_backend`) collects UMA flags via the public `ggml_backend_dev_get_props()` API on `ggml_backend_vk_reg()` - sets `is_uma = true` for `GGML_BACKEND_DEVICE_TYPE_IGPU` / `_CPU` / `_ACCEL`. `test_supertonic_vulkan_device_select.cpp` extended with 6 new test functions / 14 new checks covering the round-12 behaviour matrix (empty-UMA-preserves-round3, hybrid-prefer-discrete, multi-discrete-argmax-over-subset, all-UMA-falls-back, explicit- index-ignores-UMA-bias, mismatched-length-throws). == tetherto#6 — Text-encoder speech-prompted-attention GPU bridge == Master's Metal-port branch (PR tetherto#15) built `speech_prompted_merged_cache` (one ggml graph for QKV projection + head-split + flash-attn + out-proj end-to-end on GPU) but never wired its run path. Production text-encoder stayed on the pre-Phase-A4 two-cache pattern with host-side Q/V download -> pack -> re-upload between the QKV cache and the flash-attn cache. Round 12 tetherto#6 adds `run_speech_prompted_merged_cache` and the dispatch in `speech_prompted_attention_ggml`: if (!model_prefers_cpu_kernels(m)) { thread_local speech_prompted_merged_cache merged_caches[2]; // rebuild on key change, then: run_speech_prompted_merged_cache(merged, m, x_lc, L, style_ttl, out_lc); return; } // ... legacy two-cache CPU path unchanged Eliminates per call: - 2 GPU->host downloads (q_out, v_out) - 3 host->GPU uploads (q_pack, k_pack, v_pack) - 1 graph dispatch - All host pack work (q_pack / k_pack / v_pack head-split) = 5 sync points x 2 layers = 10 sync points / synth at the text encoder alone. CPU stays on the legacy two-cache path: master's `dense_matmul_time_ggml` CPU fast path uses cblas + the host- side head-split is a free memcpy; switching CPU to merged would pull the matmul through the slower ggml conv1d fallback and gain nothing (no sync points exist on CPU). `test_supertonic_text_encoder_gpu_bridge.cpp` (NEW) pins: - run_speech_prompted_merged_cache symbol via SFINAE - speech_prompted_merged_cache struct field contract (x_in, style_in, out, idx, L) via SFINAE - free-default-cache trip-wire (catches a buggy free path that segfaults on never-built `thread_local` cache slots at process exit) 6 / 6 CPU-only checks pass. End-to-end equivalence vs. the legacy two-cache path verified by the existing model-fixture parity tests (`test-supertonic-text-encoder-trace`, `test-supertonic-pipeline`). == tetherto#5 — Pinned-host-buffer per-step input scratchpad == Round 3 shipped the capability probe `supertonic_backend_supports_pinned_host_buffer`, which returns `true` iff `ggml_backend_vk_host_buffer_type()` is non-null on the resolved backend. The actual per-engine input-scratchpad refactor that USES the host-pinned buffer to skip ggml-vulkan's internal staging-buffer hop was deferred. Round 12 tetherto#5 lands the helper: ggml_backend_buffer_t try_alloc_inputs_in_pinned_host_buffer( const supertonic_model & model, ggml_context * input_ctx); Returns nullptr on null model.backend / null input_ctx / non- Vulkan backend / API miss. Otherwise allocates the entire input_ctx tensor set from `ggml_backend_vk_host_buffer_type()` via `ggml_backend_alloc_ctx_tensors_from_buft`. Caller owns the returned buffer; frees at cache destruction via `ggml_backend_buffer_free`. Applied via a dual-context allocation pattern at the two highest-frequency per-step input sites: - vector_group_graph_cache (x 3 for g1/g2/g3): x_in + temb_in - ve_front_block_graph_cache: x_in + mask_in + t_emb_in Total: 9 per-step input tensors moved to host-pinned memory. Each `ggml_backend_tensor_set` on these tensors skips one internal staging-buffer hop on Vulkan (BAR-mapped GPU memory written directly by the host without an intermediate copy). Dual-context pattern: 1. Create input_ctx (no_alloc=true) with ~8 tensor-overhead slots 2. Create x_in / temb_in / etc. in input_ctx 3. Try host-pinned alloc; fall back to default backend buffer via `ggml_backend_alloc_ctx_tensors(input_ctx, backend)` 4. Build the rest of the graph in cache.ctx; gallocr handles intermediates + outputs, skipping the pre-allocated inputs via the `tensor->buffer != nullptr` check Free order: gallocr -> main ctx -> input_buf -> input_ctx (reversed order would dangle gallocr pointers into freed input tensor metadata) CPU / Metal / OpenCL safety: helper returns nullptr; callers fall back to default backend buffer. Identical CPU behaviour to pre-round-12; only Vulkan gains. `test_supertonic_pinned_host_buffer.cpp` (NEW) pins: - Helper symbol existence (SFINAE) - nullptr return on CPU backend (idempotent across repeats) - Null-pointer safety on null model.backend / null input_ctx 11 / 11 CPU-only checks pass. == Combined perf snapshot on RTX 5090 == Long-prompt bench (173 chars, ~15s of audio): Round 11 baseline: 76.11 ms / 5 steps (123x realtime) Round 12 (all three): 27.99 ms / 5 steps (537x realtime) ^ 2.7x faster Vector estimator step: 12.7 ms -> 3.28 ms (3.9x faster) Prewarm cold-start: 330 ms -> 21 ms (15x faster) Short-prompt bench (Hello-world class, ~3s audio): Round 11 baseline: 44.08 ms (74x realtime) Round 12: 23.31 ms (394x realtime) Auto-pick on hybrid rig (RTX 5090 + AMD RADV iGPU): Round 11 `--vulkan-device -1`: picks RADV -> 178 ms (7x realtime) Round 12 `--vulkan-device -1`: picks RTX 5090 -> 28 ms (537x realtime) ^ 6.4x faster for users following help text == Test plan == CPU build: cmake -S tts-cpp -B tts-cpp/build -DTTS_CPP_USE_SYSTEM_GGML=OFF cmake --build tts-cpp/build -j ctest --test-dir tts-cpp/build -L unit -> 24 / 24 PASS, 0 regressions (was 22 / 22 in round 11; +1 text- encoder-gpu-bridge, +1 pinned-host-buffer) Vulkan build: cmake -S tts-cpp -B tts-cpp/build-vulkan -DTTS_CPP_USE_SYSTEM_GGML=OFF -DGGML_VULKAN=ON cmake --build tts-cpp/build-vulkan -j ctest --test-dir tts-cpp/build-vulkan -L unit -> 24 / 24 PASS End-to-end synth verified on all 4 backends (CPU, Vulkan RTX 5090, Vulkan RADV iGPU, Vulkan Mesa lavapipe) - every adapter writes a valid WAV. Co-authored-by: Cursor <cursoragent@cursor.com>

Resolves the review comments on the merged AOSC v2.1 PR (#22, merge commit e6ba38c). All eight changes are minimal and behaviour-preserving except the v2.1 detection upgrade (now strict-tag with shape fallback) and the degenerate-config guard (silence-only fallback instead of UB-adjacent boost arithmetic). Reviewer comments classified as "perf only / out of scope / would only add a TODO" are intentionally not addressed in this commit -- see the plan file referenced in the PR description. src/parakeet_sortformer.cpp -- `compress_speaker_cache` - Early-return when `spkcache_len_per_spk <= 0` (`num_spks * A_sil >= spkcache_len`). The downstream boost/top-K stages are mostly defended (`boost_topk_scores` already returns early on non-positive k), but the function was otherwise running a no-op pass that produced an all-silence cache via the slow path. Fall back to an explicit silence-only profile and bail. - Renamed `streaming_update`'s `chunk_pre_encode_lc` parameter to `committed_chunk_pre_encode`. The call site already advances past the left context (`chunk_pre_committed = ... + lc * D`), so the old `_lc` suffix was misleading. `int lc` stays -- it's used inside the function to index into `preds_full`, which still contains the left-context preds. - Replaced the magic `-1.0e30f` / `+1.0e30f` sentinels (4 sites) with named constants `k_score_neg_inf` / `k_score_pos_inf` backed by `std::numeric_limits<float>::{lowest,max}()`. Dropped the inline "-inf is UB with current FP flags" comments: IEEE 754 +/-inf is well-defined; the original concern (avoiding NaN-on-arithmetic) still holds because we only store and compare the sentinels. src/parakeet_engine.cpp - On the AOSC path, skip the `for (cur_full) remap_id(...)` loop and the `prev_chunk_full_segments = std::move(cur_full)` store: `compute_slot_remap_` is never consulted when `cache_active` is true (AOSC anchors slot identity through the speaker cache), so the work was dead. - Switched v2.1 detection from pure-shape to "prefer the converter's `parakeet.model_variant` GGUF tag; fall back to `(n_layers == 17, n_mels == 128)` for legacy GGUFs". This prevents a future v2.2/v3 variant that happens to share v2.1's encoder shape from silently opting into AOSC. include/parakeet/diarization.h - Moved the v1-vs-v2.1 detection rationale comment out of parakeet_engine.cpp and into the `SortformerStreamingOptions:: spkcache_enable` block, with a paragraph on the tag-first / shape-fallback policy. src/parakeet_ctc.{h,cpp} - Added `std::string ParakeetCtcModel::model_variant` (optional GGUF metadata mirror; empty on legacy GGUFs). - Loader reads `parakeet.model_variant` next to the existing `parakeet.model.type` read; absent key -> empty string -> detection falls back to shape. scripts/convert-nemo-to-gguf.py - New `detect_sortformer_variant(ckpt: Path)` derives a stable variant tag from the source .nemo filename (`sortformer-v1` / `sortformer-streaming-v2` / `sortformer-streaming-v2.1-aosc`); empty string for unknown checkpoints. - Sortformer branch of `write_gguf` writes `parakeet.model_variant` when the tag is non-empty. - `write_gguf` signature extended with `ckpt: Path`; only the one internal call site adjusted. scripts/download-all-models.sh - Added the diar_streaming_sortformer_4spk-v2.1 fetch block (the AOSC fine-tune that this PR's tests target); bumped the budget comment from "~14 GiB" to "~14.5 GiB" and listed v2.1 in the contents line. CMakeLists.txt + test/test_sortformer_streaming.cpp - Streaming ctest now consumes `${_qvp_sfsv21_q8_gguf}` (was `${_qvp_sfs_q8_gguf}`, the v2 model). The in-binary default GGUF path is the matching v2.1 q8_0. Aligns the test with the line-299 comment that says the binary "reflects the production v2.1 AOSC config out of the box". test/test_utils.h (new) + test/test_sortformer_{streaming,aosc_speakers}.cpp - Extracted the two 40-line `load_wav_pcm16le_mono` / `file_exists` duplicates into a shared inline header in the `parakeet_test` namespace. The duplicate copies and the "duplicated here on purpose" comment block in test_sortformer_aosc_speakers.cpp are gone; both tests `#include "test_utils.h"` and use `using parakeet_test::...`. Build + ctest verification - `cmake --build build -j` clean (no new warnings). - `ctest -R 'test-sortformer-(streaming |aosc-speakers)'`: test-sortformer-streaming ........ Passed 8.23 s test-sortformer-aosc-speakers-abcba . Passed 33.80 s test-sortformer-aosc-speakers-abcdba Passed 36.91 s The locally-symlinked v2.1 GGUF predates the `parakeet.model_variant` key, so the AOSC tests passing here also verifies the shape-fallback path. Re-running the converter on the v2.1 .nemo will populate the new key for the strict-tag path. Reviewer comments deferred / skipped (rationale): - Encoder graph cache thrashing during FIFO ramp-up (#4): perf only; proper fix wants pre-build-at-diarize_start + silence padding or a mask argument, not minimal. Tracked for a follow-up perf PR. - WAV fixtures committed as ~11 MB binaries (#8): project-wide Git LFS adoption decision, not a code change. - `ring.erase` O(n) under AOSC's aggressive trim (#10): pre-existing on the v1 path; wants a std::deque refactor, out of scope. - `encoder_ms` attribution surprising (#12): code is correct and matches sibling paths; the user explicitly opted against comment-only "clarifications".

[BCI] QVAC-17071 feat: add BCI neural signal support (variable conv1 kernel + windowed attention)

Resolves the review comments on the merged AOSC v2.1 PR (#22, merge commit e6ba38c). All eight changes are minimal and behaviour-preserving except the v2.1 detection upgrade (now strict-tag with shape fallback) and the degenerate-config guard (silence-only fallback instead of UB-adjacent boost arithmetic). Reviewer comments classified as "perf only / out of scope / would only add a TODO" are intentionally not addressed in this commit -- see the plan file referenced in the PR description. src/parakeet_sortformer.cpp -- `compress_speaker_cache` - Early-return when `spkcache_len_per_spk <= 0` (`num_spks * A_sil >= spkcache_len`). The downstream boost/top-K stages are mostly defended (`boost_topk_scores` already returns early on non-positive k), but the function was otherwise running a no-op pass that produced an all-silence cache via the slow path. Fall back to an explicit silence-only profile and bail. - Renamed `streaming_update`'s `chunk_pre_encode_lc` parameter to `committed_chunk_pre_encode`. The call site already advances past the left context (`chunk_pre_committed = ... + lc * D`), so the old `_lc` suffix was misleading. `int lc` stays -- it's used inside the function to index into `preds_full`, which still contains the left-context preds. - Replaced the magic `-1.0e30f` / `+1.0e30f` sentinels (4 sites) with named constants `k_score_neg_inf` / `k_score_pos_inf` backed by `std::numeric_limits<float>::{lowest,max}()`. Dropped the inline "-inf is UB with current FP flags" comments: IEEE 754 +/-inf is well-defined; the original concern (avoiding NaN-on-arithmetic) still holds because we only store and compare the sentinels. src/parakeet_engine.cpp - On the AOSC path, skip the `for (cur_full) remap_id(...)` loop and the `prev_chunk_full_segments = std::move(cur_full)` store: `compute_slot_remap_` is never consulted when `cache_active` is true (AOSC anchors slot identity through the speaker cache), so the work was dead. - Switched v2.1 detection from pure-shape to "prefer the converter's `parakeet.model_variant` GGUF tag; fall back to `(n_layers == 17, n_mels == 128)` for legacy GGUFs". This prevents a future v2.2/v3 variant that happens to share v2.1's encoder shape from silently opting into AOSC. include/parakeet/diarization.h - Moved the v1-vs-v2.1 detection rationale comment out of parakeet_engine.cpp and into the `SortformerStreamingOptions:: spkcache_enable` block, with a paragraph on the tag-first / shape-fallback policy. src/parakeet_ctc.{h,cpp} - Added `std::string ParakeetCtcModel::model_variant` (optional GGUF metadata mirror; empty on legacy GGUFs). - Loader reads `parakeet.model_variant` next to the existing `parakeet.model.type` read; absent key -> empty string -> detection falls back to shape. scripts/convert-nemo-to-gguf.py - New `detect_sortformer_variant(ckpt: Path)` derives a stable variant tag from the source .nemo filename (`sortformer-v1` / `sortformer-streaming-v2` / `sortformer-streaming-v2.1-aosc`); empty string for unknown checkpoints. - Sortformer branch of `write_gguf` writes `parakeet.model_variant` when the tag is non-empty. - `write_gguf` signature extended with `ckpt: Path`; only the one internal call site adjusted. scripts/download-all-models.sh - Added the diar_streaming_sortformer_4spk-v2.1 fetch block (the AOSC fine-tune that this PR's tests target); bumped the budget comment from "~14 GiB" to "~14.5 GiB" and listed v2.1 in the contents line. CMakeLists.txt + test/test_sortformer_streaming.cpp - Streaming ctest now consumes `${_qvp_sfsv21_q8_gguf}` (was `${_qvp_sfs_q8_gguf}`, the v2 model). The in-binary default GGUF path is the matching v2.1 q8_0. Aligns the test with the line-299 comment that says the binary "reflects the production v2.1 AOSC config out of the box". test/test_utils.h (new) + test/test_sortformer_{streaming,aosc_speakers}.cpp - Extracted the two 40-line `load_wav_pcm16le_mono` / `file_exists` duplicates into a shared inline header in the `parakeet_test` namespace. The duplicate copies and the "duplicated here on purpose" comment block in test_sortformer_aosc_speakers.cpp are gone; both tests `#include "test_utils.h"` and use `using parakeet_test::...`. Build + ctest verification - `cmake --build build -j` clean (no new warnings). - `ctest -R 'test-sortformer-(streaming |aosc-speakers)'`: test-sortformer-streaming ........ Passed 8.23 s test-sortformer-aosc-speakers-abcba . Passed 33.80 s test-sortformer-aosc-speakers-abcdba Passed 36.91 s The locally-symlinked v2.1 GGUF predates the `parakeet.model_variant` key, so the AOSC tests passing here also verifies the shape-fallback path. Re-running the converter on the v2.1 .nemo will populate the new key for the strict-tag path. Reviewer comments deferred / skipped (rationale): - Encoder graph cache thrashing during FIFO ramp-up (#4): perf only; proper fix wants pre-build-at-diarize_start + silence padding or a mask argument, not minimal. Tracked for a follow-up perf PR. - WAV fixtures committed as ~11 MB binaries (#8): project-wide Git LFS adoption decision, not a code change. - `ring.erase` O(n) under AOSC's aggressive trim (#10): pre-existing on the v1 path; wants a std::deque refactor, out of scope. - `encoder_ms` attribution surprising (#12): code is correct and matches sibling paths; the user explicitly opted against comment-only "clarifications".

sharmaraju352 requested review from a team as code owners April 16, 2026 10:22

Raju added 5 commits April 16, 2026 19:57

feat(bci): add variable conv1 kernel size support

775d436

Read n_audio_conv1_kernel from model hparams to allow BCI models to use a non-standard first convolution kernel size. Standard whisper models default to kernel size 3. Made-with: Cursor

fix vcpkg build

e6fcbaa

Made-with: Cursor

fix apple silicon cross compile

461f07d

Made-with: Cursor

fix windows pthread

bbb3535

Made-with: Cursor

sharmaraju352 force-pushed the feat/bci-patches-v184 branch from 5326bf7 to bbb3535 Compare April 16, 2026 14:28

GustavoA1604 requested changes Apr 16, 2026

View reviewed changes

Comment thread src/whisper.cpp Outdated

Comment thread src/whisper.cpp Outdated

Comment thread src/whisper.cpp Outdated

Raju added 2 commits April 18, 2026 10:46

ogad-tether requested changes Apr 20, 2026

View reviewed changes

Comment thread src/whisper.cpp

Comment thread src/CMakeLists.txt

Comment thread cmake/whisper-config.cmake.in Outdated

Raju added 2 commits April 20, 2026 14:52

ogad-tether self-requested a review April 20, 2026 10:21

ogad-tether approved these changes Apr 20, 2026

View reviewed changes

jpgaribotti requested changes Apr 20, 2026

View reviewed changes

Comment thread src/whisper.cpp

Comment thread src/whisper.cpp Outdated

Comment thread src/whisper.cpp

This was referenced Apr 20, 2026

[BCI] QVAC-17071 chore: switch whisper-cpp overlay to tetherto fork tetherto/qvac#1584

Closed

[BCI] QVAC-17072 chore: prepare for registry-based whisper-cpp consumption tetherto/qvac#1585

Closed

GustavoA1604 approved these changes Apr 20, 2026

View reviewed changes

sharmaraju352 merged commit 2b1e04f into master Apr 20, 2026
67 of 74 checks passed

This was referenced Apr 21, 2026

chore: bump whisper-cpp port to 1.8.4.2 (BCI patches) tetherto/qvac-registry-vcpkg#125

Merged

[BCI] QVAC-17057 feat: add bci-whispercpp POC with batch transcription tetherto/qvac#1583

Merged

Zbig9000 mentioned this pull request May 14, 2026

Supertonic optimizations qvac 18605 tts ggml add and optimize vulkan for supertonic #18

Merged

pratiknarola-t mentioned this pull request May 19, 2026

parakeet-cpp: address PR #22 AOSC v2.1 review comments #24

Merged

gianni-cor pushed a commit that referenced this pull request May 28, 2026

Merge pull request #10 from tetherto/feat/bci-patches-v184

81733db

[BCI] QVAC-17071 feat: add BCI neural signal support (variable conv1 kernel + windowed attention)

gianni-cor deleted the feat/bci-patches-v184 branch May 28, 2026 13:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BCI] QVAC-17071 feat: add BCI neural signal support (variable conv1 kernel + windowed attention)#10

[BCI] QVAC-17071 feat: add BCI neural signal support (variable conv1 kernel + windowed attention)#10
sharmaraju352 merged 10 commits into
masterfrom
feat/bci-patches-v184

sharmaraju352 commented Apr 16, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ogad-tether left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

sharmaraju352 commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

1. Variable conv1 kernel size

2. Windowed self-attention for encoder layers

Backward compatibility

Context

Test plan

Test results

BCI package (@qvac/bci-whispercpp)

Standard whisper package (@qvac/transcription-whispercpp)

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ogad-tether left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

sharmaraju352 commented Apr 16, 2026 •

edited

Loading

BCI package (`@qvac/bci-whispercpp`)

Standard whisper package (`@qvac/transcription-whispercpp`)