[Parakeet] QVAC-13814 feat: add automated benchmarks for parakeet ctc, eou and sortformer models#991
Merged
Merged
Conversation
added 2 commits
March 19, 2026 12:06
…odels Add per-model benchmark config files (config-ctc.yaml, config-eou.yaml, config-sortformer.yaml) with appropriate defaults for each model type. Update the CI workflow to support an 'all' option that runs benchmarks for every model type in a single matrix, and add a weekly schedule trigger (Sunday 04:00 UTC) for automated regression benchmarking. Add trigger scripts (trigger-benchmark.sh, trigger-benchmark-all.sh) for convenient local invocation of benchmark workflows via gh CLI. Made-with: Cursor
When CI prebuilds are not available (no successful prebuilds workflow run), fall back to installing @qvac/transcription-parakeet from npm instead of failing the entire benchmark job. Made-with: Cursor
added 4 commits
March 19, 2026 12:31
Python 3.14 changed Pickler._batch_setitems() signature which breaks the datasets library. Pin to 3.13 until upstream compatibility is fixed. Made-with: Cursor
The addon requires model-type-specific named paths (e.g. ctcModelPath, eouEncoderPath, sortformerPath) when activating non-TDT models. Add getNamedPaths() that resolves the correct file paths per model type and spreads them into the parakeetConfig passed to the addon constructor. Made-with: Cursor
The addon reads ctcModelPath/eouEncoderPath/sortformerPath from the top-level config object (this._config), not from parakeetConfig. Made-with: Cursor
The tetherto/sortformer-4spk-v2-onnx HuggingFace repo is gated and returns an invalid file. Use the public cgus community repo that the integration tests already rely on. Made-with: Cursor
trigger-benchmark.sh already supports -t all, making the separate trigger-benchmark-all.sh unnecessary. Made-with: Cursor
ishanvohra2
previously approved these changes
Mar 19, 2026
Contributor
Tier-based Approval Status |
GustavoA1604
requested changes
Mar 19, 2026
Per review feedback — "automated" means triggered via workflow_dispatch, not periodic autonomous runs. Made-with: Cursor
…r script - Change MODEL_TYPE fallback from 'all' to 'tdt' to match the workflow_dispatch UI default - Replace unreachable $? check (dead code under set -e) with proper if-not construct in trigger-benchmark.sh Made-with: Cursor
GustavoA1604
approved these changes
Mar 23, 2026
ogad-tether
approved these changes
Mar 23, 2026
ogad-tether
left a comment
Contributor
There was a problem hiding this comment.
LGTM. Splitting configs per model type, wiring named paths into the benchmark server, and the matrix '''all''' option are all clear wins. Python 3.13 pin for the benchmark workflow is a sensible fix for the datasets stack.
Contributor
|
Sortformer weights now pull from the |
Contributor
Author
|
/review |
Contributor
Author
|
/review |
GustavoA1604
added a commit
that referenced
this pull request
Mar 25, 2026
* fix: statically link parakeet prebuilds Made-with: Cursor * fix: restore parakeet linux runtime loading Made-with: Cursor * fix: address parakeet apple prebuild failures Made-with: Cursor * chore: remove parakeet release notes file Made-with: Cursor * fix: use static requires for mobile bare-pack bundling The _resolve() helper used computed require paths that bare-pack could not statically trace, so the addon modules were missing from the mobile bundle. Use static string literals for mobile paths (traced by bare-pack) and variable paths for desktop (skipped by bare-pack since ../../ doesn't exist in the mobile layout). Made-with: Cursor * feat[notask]: add download profiler for registry blob performance diagnostics (#1040) * feat[notask]: add download profiler for registry blob performance diagnostics Made-with: Cursor * fix: move profiler deps from devDependencies to dependencies Made-with: Cursor * doc: add profile command and example to client README Made-with: Cursor * fix: show full peer keys in profiler output for troubleshooting Made-with: Cursor * fix: validate parseInt results for interval and timeout CLI flags Made-with: Cursor --------- Co-authored-by: Proletter <40578159+Proletter@users.noreply.github.com> Co-authored-by: Simon Iribarren <simon.ig13@gmail.com> * fix: resolve dependabot alerts for registry-server transitive deps (#1093) * fix(registry-server): PBKDF2 for passphrase-derived keys (CodeQL #9) (#1065) * fix(registry-server): derive passphrase keys with PBKDF2 Replace single-pass SHA-256 with PBKDF2-HMAC-SHA256 (310k iterations) for deterministic test keys; addresses CodeQL js/insufficient-password-hash. * chore(registry-server): remove passphrase migration note from guide --------- Co-authored-by: Proletter <40578159+Proletter@users.noreply.github.com> * fix[notask]: lazy-load Node builtins in profiler for Bare runtime compatibility (#1096) * fix[notask]: sanitize SSE output to prevent reflected XSS (#1027) Co-authored-by: Marco <1369747+elchiapp@users.noreply.github.com> * [Parakeet] QVAC-13814 feat: add automated benchmarks for parakeet ctc, eou and sortformer models (#991) * feat: add automated benchmarks for parakeet ctc, eou and sortformer models Add per-model benchmark config files (config-ctc.yaml, config-eou.yaml, config-sortformer.yaml) with appropriate defaults for each model type. Update the CI workflow to support an 'all' option that runs benchmarks for every model type in a single matrix, and add a weekly schedule trigger (Sunday 04:00 UTC) for automated regression benchmarking. Add trigger scripts (trigger-benchmark.sh, trigger-benchmark-all.sh) for convenient local invocation of benchmark workflows via gh CLI. Made-with: Cursor * fix: make prebuilds step non-fatal with npm fallback When CI prebuilds are not available (no successful prebuilds workflow run), fall back to installing @qvac/transcription-parakeet from npm instead of failing the entire benchmark job. Made-with: Cursor * fix: use python 3.13 for benchmark client compatibility Python 3.14 changed Pickler._batch_setitems() signature which breaks the datasets library. Pin to 3.13 until upstream compatibility is fixed. Made-with: Cursor * fix: add named model paths in benchmark server for ctc/eou/sortformer The addon requires model-type-specific named paths (e.g. ctcModelPath, eouEncoderPath, sortformerPath) when activating non-TDT models. Add getNamedPaths() that resolves the correct file paths per model type and spreads them into the parakeetConfig passed to the addon constructor. Made-with: Cursor * fix: spread named paths at config top level, not inside parakeetConfig The addon reads ctcModelPath/eouEncoderPath/sortformerPath from the top-level config object (this._config), not from parakeetConfig. Made-with: Cursor * fix: use public cgus repo for sortformer model download The tetherto/sortformer-4spk-v2-onnx HuggingFace repo is gated and returns an invalid file. Use the public cgus community repo that the integration tests already rely on. Made-with: Cursor * chore: remove redundant trigger-benchmark-all.sh trigger-benchmark.sh already supports -t all, making the separate trigger-benchmark-all.sh unnecessary. Made-with: Cursor * chore: remove scheduled cron trigger from benchmark workflow Per review feedback — "automated" means triggered via workflow_dispatch, not periodic autonomous runs. Made-with: Cursor * fix: correct workflow fallback default and remove dead code in trigger script - Change MODEL_TYPE fallback from 'all' to 'tdt' to match the workflow_dispatch UI default - Replace unreachable $? check (dead code under set -e) with proper if-not construct in trigger-benchmark.sh Made-with: Cursor --------- Co-authored-by: Raju <raju.sharma> * fix[notask]: replace global streaming state with per-instance map in whispercpp (#1079) The streaming processor used three process-global variables (g_streamingMtx, g_streamingInstance, g_streamingProcessor) which limited the entire process to a single streaming session and risked dangling-pointer access if the owning AddonJs instance was destroyed without cleanup. Replace with an unordered_map keyed by AddonJs* so each addon instance independently owns its streaming session, eliminating the race condition and enabling concurrent streaming across multiple instances. Made-with: Cursor Co-authored-by: Raju <raju.sharma> * chore[notask]: replace deprecated istanbul with nyc in decoder-audio (#1082) * chore[notask]: replace deprecated istanbul with nyc in decoder-audio The istanbul package has been deprecated since 2016 and carries known vulnerable transitive dependencies (minimatch ReDoS, uglify-js ReDoS). Replace with nyc ^17.1.0 (the actively maintained successor) and update coverage scripts to use nyc CLI syntax. Made-with: Cursor * fix[notask]: fix nyc coverage report command to use .nyc_output directory The nyc report command expects coverage data in .nyc_output/ rather than reading from --temp-dir directly. Copy brittle's coverage-final.json into .nyc_output/ before running nyc report so the HTML report generates cleanly without format warnings. Made-with: Cursor --------- Co-authored-by: Raju <raju.sharma> * Updated dependencies with android-arm64 fix (#1095) Co-authored-by: gianni <gianfranco.cordella@tether.io> * fix[notask]: sanitize error messages to prevent filesystem path leakage (#1084) Error messages in whispercpp and parakeet validateModelFiles() included full filesystem paths (e.g. "Model file doesn't exist: /home/user/..."). When surfaced via API responses this reveals internal server layout. Log the full path at debug/error level for operators, but throw generic messages without paths to callers. Made-with: Cursor Co-authored-by: Raju <raju.sharma> * fix[notask]: wrap job ID counter at MAX_SAFE_INTEGER to prevent precision loss (#1085) The _nextJobId counter in WhisperInterface and ParakeetInterface was incremented without bounds. After 2^53 increments, JavaScript loses integer precision and job ID collisions become possible. Replace raw += 1 with nextSafeId() that wraps back to 1 at Number.MAX_SAFE_INTEGER, preserving Number type compatibility for existing consumers. Made-with: Cursor Co-authored-by: Raju <raju.sharma> * fix: catch unhandled rejections in mobile integration runtime Register Bare.on('unhandledRejection') and Bare.on('uncaughtException') handlers to prevent the runtime from aborting (SIGABRT) when network errors escape the promise chain during model downloads. Made-with: Cursor * fix: bundle audio samples and resolve asset paths for mobile tests Add sample-16k.wav, French.raw, and croatian.raw to testAssets so integration tests can run transcription on mobile without downloading. Update getTestPaths to resolve samplesDir from the bundled asset manifest on mobile instead of a non-existent writableRoot/samples path. Made-with: Cursor * chore: bump parakeet to 0.2.4 Made-with: Cursor * chore: bump parakeet to 0.2.5 Made-with: Cursor --------- Co-authored-by: Raju <raju.sharma> Co-authored-by: Yury Samarin <yuri.a.samarin@gmail.com> Co-authored-by: Proletter <40578159+Proletter@users.noreply.github.com> Co-authored-by: Simon Iribarren <simon.ig13@gmail.com> Co-authored-by: Marco <1369747+elchiapp@users.noreply.github.com> Co-authored-by: Raju Sharma <sharmaraju352@gmail.com> Co-authored-by: Juan Pablo Garibotti Arias <juan.arias@bitfinex.com> Co-authored-by: gianni <gianfranco.cordella@tether.io> Co-authored-by: GustavoA1604 <54457676+GustavoA1604@users.noreply.github.com>
Proletter
pushed a commit
that referenced
this pull request
May 24, 2026
…, eou and sortformer models (#991) * feat: add automated benchmarks for parakeet ctc, eou and sortformer models Add per-model benchmark config files (config-ctc.yaml, config-eou.yaml, config-sortformer.yaml) with appropriate defaults for each model type. Update the CI workflow to support an 'all' option that runs benchmarks for every model type in a single matrix, and add a weekly schedule trigger (Sunday 04:00 UTC) for automated regression benchmarking. Add trigger scripts (trigger-benchmark.sh, trigger-benchmark-all.sh) for convenient local invocation of benchmark workflows via gh CLI. Made-with: Cursor * fix: make prebuilds step non-fatal with npm fallback When CI prebuilds are not available (no successful prebuilds workflow run), fall back to installing @qvac/transcription-parakeet from npm instead of failing the entire benchmark job. Made-with: Cursor * fix: use python 3.13 for benchmark client compatibility Python 3.14 changed Pickler._batch_setitems() signature which breaks the datasets library. Pin to 3.13 until upstream compatibility is fixed. Made-with: Cursor * fix: add named model paths in benchmark server for ctc/eou/sortformer The addon requires model-type-specific named paths (e.g. ctcModelPath, eouEncoderPath, sortformerPath) when activating non-TDT models. Add getNamedPaths() that resolves the correct file paths per model type and spreads them into the parakeetConfig passed to the addon constructor. Made-with: Cursor * fix: spread named paths at config top level, not inside parakeetConfig The addon reads ctcModelPath/eouEncoderPath/sortformerPath from the top-level config object (this._config), not from parakeetConfig. Made-with: Cursor * fix: use public cgus repo for sortformer model download The tetherto/sortformer-4spk-v2-onnx HuggingFace repo is gated and returns an invalid file. Use the public cgus community repo that the integration tests already rely on. Made-with: Cursor * chore: remove redundant trigger-benchmark-all.sh trigger-benchmark.sh already supports -t all, making the separate trigger-benchmark-all.sh unnecessary. Made-with: Cursor * chore: remove scheduled cron trigger from benchmark workflow Per review feedback — "automated" means triggered via workflow_dispatch, not periodic autonomous runs. Made-with: Cursor * fix: correct workflow fallback default and remove dead code in trigger script - Change MODEL_TYPE fallback from 'all' to 'tdt' to match the workflow_dispatch UI default - Replace unreachable $? check (dead code under set -e) with proper if-not construct in trigger-benchmark.sh Made-with: Cursor --------- Co-authored-by: Raju <raju.sharma>
Proletter
added a commit
that referenced
this pull request
May 24, 2026
* fix: statically link parakeet prebuilds Made-with: Cursor * fix: restore parakeet linux runtime loading Made-with: Cursor * fix: address parakeet apple prebuild failures Made-with: Cursor * chore: remove parakeet release notes file Made-with: Cursor * fix: use static requires for mobile bare-pack bundling The _resolve() helper used computed require paths that bare-pack could not statically trace, so the addon modules were missing from the mobile bundle. Use static string literals for mobile paths (traced by bare-pack) and variable paths for desktop (skipped by bare-pack since ../../ doesn't exist in the mobile layout). Made-with: Cursor * feat[notask]: add download profiler for registry blob performance diagnostics (#1040) * feat[notask]: add download profiler for registry blob performance diagnostics Made-with: Cursor * fix: move profiler deps from devDependencies to dependencies Made-with: Cursor * doc: add profile command and example to client README Made-with: Cursor * fix: show full peer keys in profiler output for troubleshooting Made-with: Cursor * fix: validate parseInt results for interval and timeout CLI flags Made-with: Cursor --------- Co-authored-by: Proletter <40578159+Proletter@users.noreply.github.com> Co-authored-by: Simon Iribarren <simon.ig13@gmail.com> * fix: resolve dependabot alerts for registry-server transitive deps (#1093) * fix(registry-server): PBKDF2 for passphrase-derived keys (CodeQL #9) (#1065) * fix(registry-server): derive passphrase keys with PBKDF2 Replace single-pass SHA-256 with PBKDF2-HMAC-SHA256 (310k iterations) for deterministic test keys; addresses CodeQL js/insufficient-password-hash. * chore(registry-server): remove passphrase migration note from guide --------- Co-authored-by: Proletter <40578159+Proletter@users.noreply.github.com> * fix[notask]: lazy-load Node builtins in profiler for Bare runtime compatibility (#1096) * fix[notask]: sanitize SSE output to prevent reflected XSS (#1027) Co-authored-by: Marco <1369747+elchiapp@users.noreply.github.com> * [Parakeet] QVAC-13814 feat: add automated benchmarks for parakeet ctc, eou and sortformer models (#991) * feat: add automated benchmarks for parakeet ctc, eou and sortformer models Add per-model benchmark config files (config-ctc.yaml, config-eou.yaml, config-sortformer.yaml) with appropriate defaults for each model type. Update the CI workflow to support an 'all' option that runs benchmarks for every model type in a single matrix, and add a weekly schedule trigger (Sunday 04:00 UTC) for automated regression benchmarking. Add trigger scripts (trigger-benchmark.sh, trigger-benchmark-all.sh) for convenient local invocation of benchmark workflows via gh CLI. Made-with: Cursor * fix: make prebuilds step non-fatal with npm fallback When CI prebuilds are not available (no successful prebuilds workflow run), fall back to installing @qvac/transcription-parakeet from npm instead of failing the entire benchmark job. Made-with: Cursor * fix: use python 3.13 for benchmark client compatibility Python 3.14 changed Pickler._batch_setitems() signature which breaks the datasets library. Pin to 3.13 until upstream compatibility is fixed. Made-with: Cursor * fix: add named model paths in benchmark server for ctc/eou/sortformer The addon requires model-type-specific named paths (e.g. ctcModelPath, eouEncoderPath, sortformerPath) when activating non-TDT models. Add getNamedPaths() that resolves the correct file paths per model type and spreads them into the parakeetConfig passed to the addon constructor. Made-with: Cursor * fix: spread named paths at config top level, not inside parakeetConfig The addon reads ctcModelPath/eouEncoderPath/sortformerPath from the top-level config object (this._config), not from parakeetConfig. Made-with: Cursor * fix: use public cgus repo for sortformer model download The tetherto/sortformer-4spk-v2-onnx HuggingFace repo is gated and returns an invalid file. Use the public cgus community repo that the integration tests already rely on. Made-with: Cursor * chore: remove redundant trigger-benchmark-all.sh trigger-benchmark.sh already supports -t all, making the separate trigger-benchmark-all.sh unnecessary. Made-with: Cursor * chore: remove scheduled cron trigger from benchmark workflow Per review feedback — "automated" means triggered via workflow_dispatch, not periodic autonomous runs. Made-with: Cursor * fix: correct workflow fallback default and remove dead code in trigger script - Change MODEL_TYPE fallback from 'all' to 'tdt' to match the workflow_dispatch UI default - Replace unreachable $? check (dead code under set -e) with proper if-not construct in trigger-benchmark.sh Made-with: Cursor --------- Co-authored-by: Raju <raju.sharma> * fix[notask]: replace global streaming state with per-instance map in whispercpp (#1079) The streaming processor used three process-global variables (g_streamingMtx, g_streamingInstance, g_streamingProcessor) which limited the entire process to a single streaming session and risked dangling-pointer access if the owning AddonJs instance was destroyed without cleanup. Replace with an unordered_map keyed by AddonJs* so each addon instance independently owns its streaming session, eliminating the race condition and enabling concurrent streaming across multiple instances. Made-with: Cursor Co-authored-by: Raju <raju.sharma> * chore[notask]: replace deprecated istanbul with nyc in decoder-audio (#1082) * chore[notask]: replace deprecated istanbul with nyc in decoder-audio The istanbul package has been deprecated since 2016 and carries known vulnerable transitive dependencies (minimatch ReDoS, uglify-js ReDoS). Replace with nyc ^17.1.0 (the actively maintained successor) and update coverage scripts to use nyc CLI syntax. Made-with: Cursor * fix[notask]: fix nyc coverage report command to use .nyc_output directory The nyc report command expects coverage data in .nyc_output/ rather than reading from --temp-dir directly. Copy brittle's coverage-final.json into .nyc_output/ before running nyc report so the HTML report generates cleanly without format warnings. Made-with: Cursor --------- Co-authored-by: Raju <raju.sharma> * Updated dependencies with android-arm64 fix (#1095) Co-authored-by: gianni <gianfranco.cordella@tether.io> * fix[notask]: sanitize error messages to prevent filesystem path leakage (#1084) Error messages in whispercpp and parakeet validateModelFiles() included full filesystem paths (e.g. "Model file doesn't exist: /home/user/..."). When surfaced via API responses this reveals internal server layout. Log the full path at debug/error level for operators, but throw generic messages without paths to callers. Made-with: Cursor Co-authored-by: Raju <raju.sharma> * fix[notask]: wrap job ID counter at MAX_SAFE_INTEGER to prevent precision loss (#1085) The _nextJobId counter in WhisperInterface and ParakeetInterface was incremented without bounds. After 2^53 increments, JavaScript loses integer precision and job ID collisions become possible. Replace raw += 1 with nextSafeId() that wraps back to 1 at Number.MAX_SAFE_INTEGER, preserving Number type compatibility for existing consumers. Made-with: Cursor Co-authored-by: Raju <raju.sharma> * fix: catch unhandled rejections in mobile integration runtime Register Bare.on('unhandledRejection') and Bare.on('uncaughtException') handlers to prevent the runtime from aborting (SIGABRT) when network errors escape the promise chain during model downloads. Made-with: Cursor * fix: bundle audio samples and resolve asset paths for mobile tests Add sample-16k.wav, French.raw, and croatian.raw to testAssets so integration tests can run transcription on mobile without downloading. Update getTestPaths to resolve samplesDir from the bundled asset manifest on mobile instead of a non-existent writableRoot/samples path. Made-with: Cursor * chore: bump parakeet to 0.2.4 Made-with: Cursor * chore: bump parakeet to 0.2.5 Made-with: Cursor --------- Co-authored-by: Raju <raju.sharma> Co-authored-by: Yury Samarin <yuri.a.samarin@gmail.com> Co-authored-by: Proletter <40578159+Proletter@users.noreply.github.com> Co-authored-by: Simon Iribarren <simon.ig13@gmail.com> Co-authored-by: Marco <1369747+elchiapp@users.noreply.github.com> Co-authored-by: Raju Sharma <sharmaraju352@gmail.com> Co-authored-by: Juan Pablo Garibotti Arias <juan.arias@bitfinex.com> Co-authored-by: gianni <gianfranco.cordella@tether.io> Co-authored-by: GustavoA1604 <54457676+GustavoA1604@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
config-ctc.yaml,config-eou.yaml,config-sortformer.yaml) with model-appropriate defaults (timeouts, streaming mode, WER/CER toggles)alloption that benchmarks every model type in a single matrix runtrigger-benchmark.shfor convenient local invocation of benchmark workflows viaghCLISuccessful Workflow Runs
All three new model types verified end-to-end from this branch:
Details
Config files
config-ctc.yamlconfig-eou.yamlconfig-sortformer.yamlWorkflow changes
allto model_type dropdown (runs TDT + CTC + EOU + Sortformer in one matrix = 14 parallel jobs)datasetslibrary Pickler)Benchmark server changes
getNamedPaths()to resolve model-type-specific file paths (ctcModelPath, eouEncoderPath, sortformerPath, etc.)_hasNamedPaths()/activate()interfacecguscommunity HuggingFace repoTrigger script
trigger-benchmark.sh— Trigger benchmark for a single model type or all types (-t ctc|eou|sortformer|tdt|all), with-mmax samples,-Wwatch,-bbranch optionsTest plan