QVAC-18421 test[skiplog]: add e2e regression tests for Bergamot vocab cache invalidation by Victor-Rodzko · Pull Request #2004 · tetherto/qvac

Victor-Rodzko · 2026-05-12T14:26:43Z

🎯 What problem does this PR solve?

Regression coverage for QVAC-18420 — for bidirectional Bergamot pairs (e.g. fr↔en) the shared vocab file vocab.<pair>.spm was silently re-downloaded on every loadModel call. Root cause: deduplicateModels dropped one of two byte-identical registry entries, then getModelByPath() returned undefined, expectedSize collapsed to 0, and validateCachedFile wiped the cached vocab.
The bug was caught by users rather than by our test suite.
Existing unit tests (update-models-dedup, nmtcpp-resolve-vocab) cover the fix in isolation but don't exercise the end-to-end symptom — silent re-download through the live server path.

📝 How does it solve it?

Adds 2 e2e tests in tests-qvac covering the shared-vocab branch (vocab.<pair>.spm):
- translation-bergamot-fr-en-cache-reload
- translation-bergamot-en-fr-cache-reload
Each test does load → unload (Round 1, warm cache) then load with onProgress → unload (Round 2, must be a pure cache hit).
Cache-hit detection is platform-agnostic — counts partial-percentage progress events instead of snapshotting ~/.qvac/models via node:fs. A real re-download emits many downloaded < total events; a true cache hit emits at most one final 100% event per file. The test fails if any partial events are seen on Round 2.
New TranslationBergamotCacheExecutor lives in tests/shared/executors/ and uses dependency: "none" so ResourceManager evicts in-memory models without touching the on-disk cache.
Desktop-only: skipped on mobile via SkipExecutor. The bug is in server-side Bare code that's bit-identical across platforms, so desktop coverage is the source of truth and we save expensive Device Farm cycles for a regression that can't manifest differently on the device.

🧪 How was it tested?

Local desktop run local-local-1778595279738: 19/19 tests passed, both new tests green (128 ms / 129 ms).
Verified via loadModel.ttfb profiler counter (count: 2) that streaming/onProgress was actually wired up on Round 2 — so an empty-events pass reflects a true cache hit, not a silently disabled callback.
Server logs confirm both rounds fully validate all 4 companion files (model + lex + vocab + metadata) without any re-download.

… cache invalidation Adds 2 e2e tests in tests-qvac (translation-bergamot-fr-en-cache-reload, translation-bergamot-en-fr-cache-reload) covering the QVAC-18420 regression where shared vocab files for bidirectional Bergamot pairs were silently re-downloaded on every loadModel call. Each test does load -> unload (Round 1, warm cache) then load with onProgress -> unload (Round 2, must be a pure cache hit). Cache-hit detection is platform-agnostic via partial-percentage progress event counting (no node:fs snapshots). Skipped on mobile via SkipExecutor since the bug lives in server-side Bare code that is bit-identical across platforms. Co-authored-by: Cursor <cursoragent@cursor.com>

simon-iribarren

The shape, placement, and intent are right — tests-only PR, test[skiplog] tag, tier1+verify both set, no production-code creep. The mobile SkipExecutor comment ("Server-side Bare code path, identical across platforms — desktop coverage is source of truth") is exactly the level of rationale that prevents future drift, and cache-hit detection via onProgress rather than node:fs snapshots is the right call for mobile portability.

The only red CI check is CodeQL / Analyze (python) failing on a github/codeql-action download 429 — this PR has 0 Python files, so it's unambiguously an infra flake; a rerun should clear it.

Three non-blocking nits worth folding in:

1. Round 2 can pass even if `onProgress` silently stops firing

translation-bergamot-cache-executor.ts asserts touchedKeys.size === 0, but if a future change drops the onProgress wiring on the cache-hit path, round2 is empty and the test returns passed: true with 0 cache-hit notification(s) — the wrong outcome. Suggest a positive lower bound: assert at least N final percentage === 100 events (one per cached companion file: model + lex + vocab + metadata = 4), which directly mirrors your "Server logs confirm both rounds fully validate all 4 companion files" claim in the description.

2. Round 1 has no `onProgress` callback

There's no positive evidence Round 1 actually exercised the download path. If something later quietly causes Round 1 to be a cache hit (test reordering, a global fixture pre-warms the cache, or someone removes dependency: "none"), Round 2 trivially passes for the wrong reason and we lose the regression coverage. Two options: (a) attach onProgress to Round 1 too and assert round1.length >= round2.length (Round 1 is at least as active as Round 2) — self-validating regardless of cache state at entry; (b) document explicitly that the regression fires regardless of Round 1's cache state.

3. Pattern overlap with `TranslationExecutor` is fragile

/^translation-(indictrans|bergamot|llm|salamandra|afriquegemma)-/ also matches translation-bergamot-fr-en-cache-reload. The fix (register TranslationBergamotCacheExecutor first + first-match-wins + inline comment) is correct, but the next person to alphabetize the handler list silently breaks it. Cheapest hardening: tighten TranslationExecutor.pattern with a negative lookahead — /^translation-(...)-(?!.*cache-reload)/ — so the dispatchers are mutually exclusive at the regex level.

Micro-nits

estimatedDurationMs: 180000 is 3 min for a test you measured at 128 ms locally — probably fine for a cold Round 1 on a slow runner, but 30–60s is plenty of slack and won't mask a hang.
A one-line comment in translation-bergamot-cache-tests.ts explaining why dependency: "none" (so ConsumerBase doesn't pre-warm the cache before the test even starts) would help a future reader not "fix" it.

Solid regression coverage for QVAC-18420. Approving — CodeQL Python flake aside, none of the above would push me away.

github-actions · 2026-05-13T10:22:31Z

Tier-based Approval Status

**PR Tier:** TIER1

**Current Status:** ✅ APPROVED

**Requirements:**
- 1 Team Member approval ✅ (1/1)
- 1 Team Lead OR Management approval ✅ (1/1)



---
*This comment is automatically updated when reviews change.*

kinsta · 2026-05-13T18:06:27Z

Preview deployments for qvac-docs-staging ⚡️

Status	Branch preview	Commit preview
🔁 Deploying...	N/A	N/A

Commit: bacf19bb2516b71fc97eaf55161e5c93187bcfac

Deployment ID: 870eef2c-eb1f-4bda-9f53-5e0d896d3c01

Static site name: qvac-docs-staging-fazwv

… cache invalidation (#2004) Adds 2 e2e tests in tests-qvac (translation-bergamot-fr-en-cache-reload, translation-bergamot-en-fr-cache-reload) covering the QVAC-18420 regression where shared vocab files for bidirectional Bergamot pairs were silently re-downloaded on every loadModel call. Each test does load -> unload (Round 1, warm cache) then load with onProgress -> unload (Round 2, must be a pure cache hit). Cache-hit detection is platform-agnostic via partial-percentage progress event counting (no node:fs snapshots). Skipped on mobile via SkipExecutor since the bug lives in server-side Bare code that is bit-identical across platforms. Co-authored-by: Cursor <cursoragent@cursor.com>

Victor-Rodzko requested review from a team as code owners May 12, 2026 14:26

Victor-Rodzko had a problem deploying to release May 12, 2026 14:27 — with GitHub Actions Failure

Victor-Rodzko added verify tier1 labels May 12, 2026

Victor-Rodzko had a problem deploying to release May 12, 2026 14:30 — with GitHub Actions Failure

simon-iribarren approved these changes May 13, 2026

View reviewed changes

NamelsKing approved these changes May 13, 2026

View reviewed changes

Merge branch 'main' into test/qvac-18421-bergamot-cache-reload-e2e

bacf19b

Victor-Rodzko temporarily deployed to release May 13, 2026 18:06 — with GitHub Actions Inactive

Victor-Rodzko merged commit 56690bd into main May 13, 2026
23 checks passed

Victor-Rodzko deleted the test/qvac-18421-bergamot-cache-reload-e2e branch May 13, 2026 18:10

Victor-Rodzko had a problem deploying to release May 13, 2026 18:10 — with GitHub Actions Failure

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

QVAC-18421 test[skiplog]: add e2e regression tests for Bergamot vocab cache invalidation#2004

QVAC-18421 test[skiplog]: add e2e regression tests for Bergamot vocab cache invalidation#2004
Victor-Rodzko merged 2 commits into
mainfrom
test/qvac-18421-bergamot-cache-reload-e2e

Victor-Rodzko commented May 12, 2026 •

edited

Loading

Uh oh!

simon-iribarren left a comment

Uh oh!

github-actions Bot commented May 13, 2026 •

edited

Loading

Uh oh!

kinsta Bot commented May 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Victor-Rodzko commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🎯 What problem does this PR solve?

📝 How does it solve it?

🧪 How was it tested?

Uh oh!

simon-iribarren left a comment

Choose a reason for hiding this comment

1. Round 2 can pass even if onProgress silently stops firing

2. Round 1 has no onProgress callback

3. Pattern overlap with TranslationExecutor is fragile

Micro-nits

Uh oh!

github-actions Bot commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Tier-based Approval Status

Uh oh!

kinsta Bot commented May 13, 2026

Preview deployments for qvac-docs-staging ⚡️

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Victor-Rodzko commented May 12, 2026 •

edited

Loading

1. Round 2 can pass even if `onProgress` silently stops firing

2. Round 1 has no `onProgress` callback

3. Pattern overlap with `TranslationExecutor` is fragile

github-actions Bot commented May 13, 2026 •

edited

Loading