fix(vdsm): refresh DSM search index after creating test data#67
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
Fixes the tests/test_integration.py::TestSearch::test_search_keyword_finds_directory flake on vdsm CI. The test was reliably failing on PR #66's CI run after 6 retries (65s budget) because DSM Universal Search hadn't crawled /testshare/Documents/Bambu Studio yet — DSM doesn't index non-indexed shares promptly on a freshly-booted vdsm. Adds two synoindex calls in tests/vdsm/setup_dsm.py after the SSH-driven test data creation: /usr/syno/bin/synoindex -A -d /volume1/testshare/Documents /usr/syno/bin/synoindex -A -d /volume1/testshare/Media `synoindex -A -d <path>` registers a directory subtree with DSM's search index immediately, rather than waiting for the periodic indexer to scan. With the test data indexed at golden-image build time, search calls from the test will reliably find the target without retries. Best-effort: a non-zero return from synoindex logs a warning and continues setup. If a hypothetical DSM build doesn't have the binary at /usr/syno/bin/synoindex, image build still completes and we just fall back to the pre-existing flake (no regression). Modifying setup_dsm.py invalidates the vdsm-workflow's golden-image cache key (keyed on hash of the setup scripts), so the next CI run rebuilds the image with the fix baked in. After that, every cache hit gets the indexed state for free. Verification gate (per playbook): can't run synoindex locally without a vdsm container; the live verification is "watch the next vdsm CI run on this PR or a subsequent one and confirm test_search_keyword_finds_directory passes on the first attempt." If synoindex doesn't behave as expected we'll see another flake and iterate — falling back to the pre-existing flake is a no-regression worst case.
2d989d1 to
596cf63
Compare
cmeans
left a comment
There was a problem hiding this comment.
QA Round 1 — PR #67
CI exercise audit
The vdsm CI on this PR (run 24972241383) is the canonical pre-merge verification of the fix. Investigated whether the cached golden image actually contained the synoindex change:
| Question | Answer |
|---|---|
| Did the cache key change vs. main? | Yes. gh cache list shows two vdsm-golden-7.2.2-* entries: the main-branch cache (81ea58e5…, 2026-04-14) and a new one for this PR (3219ff6b…, 2026-04-27 01:14:19) |
| Was the cache populated by a build that included the synoindex fix? | Yes. Earlier vdsm runs on this branch (head 2d989d1a, ~01:01) had setup_dsm.py already containing the synoindex calls — the cache built then was reused for the run 3 hit at 01:25 |
| Did the test pass? | Yes, on attempt 4 of 6 (at ~01:29:19, ~44s after first attempt at 01:28:40) |
The success-criterion gap
Dev's stated success criterion: "test_search_keyword_finds_directory passes on the first attempt without retries".
Actual result: passes on attempt 4 (after 3 retries with cumulative ~37s wait). That's an improvement vs. PR #66 (failed after 6 attempts), but doesn't meet "first attempt".
Findings
1. [substantive] The fix is incomplete: synoindex at golden-image-build time can't pre-index a directory that doesn't yet exist.
The test (tests/test_integration.py:328-...test_search_keyword_finds_directory) creates ${folder}/${keyword} Studio (i.e. /testshare/Documents/Bambu Studio) dynamically at test run time:
# Ensure a directory with the keyword in its name exists.
# DSM search matches file/directory names, not content.
# On real NAS the search_folder likely already has matching content;
# on vdsm we create it here.
search_dir = f"{folder}/{keyword} Studio"The synoindex calls in tests/vdsm/setup_dsm.py execute during the golden-image build. They can index /testshare/Documents and /testshare/Media because those exist at build time — but the Bambu Studio subdirectory doesn't exist until the test creates it. Whatever priming effect synoindex has on the parent does not eliminate the indexer's async latency for newly-added subdirectories, which is why retries are still needed.
The fix does provide some benefit — likely warming the indexer state for the parent path means it picks up new children faster than from cold. But it doesn't address the actual bug (DSM indexer latency for runtime-created directories).
Two ways forward — pick whichever is preferable:
- A. Move synoindex to test-fixture time (preferred). After the test creates
search_dir, the test (or a fixture wrapping it) callssynoindex -A -d ${search_dir}over SSH from the test runner. That synchronously registers the new subdirectory and should make attempt 1 succeed. Requires SSH plumbing in the test fixture, but the existingsetup_dsm.pyalready has SSH helpers that can be lifted. - B. Accept partial fix; restate the success criterion. Update the PR description and CHANGELOG to claim "reduces flake from 6+-attempt failures to 4-attempt successes" rather than "passes on first attempt", and consider raising the test's max retry count from 6 to (say) 8 for safety margin. Less satisfying but ships the improvement now.
A is the root-cause fix. B is the pragmatic fix.
Verification on 596cf63
| Check | Result |
|---|---|
python ast.parse(setup_dsm.py) |
parses |
uv run pytest |
518 passed (unchanged from #66), 96.15% coverage |
uv run ruff check src/ tests/ scripts/ |
clean |
uv run ruff format --check src/ tests/ scripts/ |
69 files clean |
| Required CI rollup | all green (lint, typecheck, test 3.11/3.12/3.13, version-sync, validate-server-json) |
| vdsm CI overall | passed (47/47), but test_search_keyword_finds_directory needed 4 retries |
| Cache hashFiles invalidation | confirmed working — new key 3219ff6b… exists for this PR |
| No-regression worst case | confirmed — best-effort warning on synoindex failure |
Verdict
QA Failed pending finding 1 — either iterate to A (move synoindex to test-fixture time) or formally accept the partial-fix scope per B.
The change as-is is a real but partial improvement. Worst-case behavior is no regression. Maintainer's call on whether to iterate or ship as-is with updated framing.
QA audit — round 1Branch / SHA: `fix/vdsm-search-flake-via-synoindex` @ `596cf63` Cache invalidation correctness: 3219ff6b... 2026-04-27T01:14:19 refs/pull/67/merge (NEW — populated during this PR)81ea58e5... 2026-04-14T02:40:47 refs/heads/main (pre-existing main cache)``` The hashFiles formula correctly invalidated the cache key when `setup_dsm.py` changed. This PR's branch built a fresh golden image at ~01:14 (run `2d989d1a`) and the current head `596cf63` hit that cache at 01:25. Test result on the synoindex-baked image: Test passed on attempt 4, ~44s after first attempt. Better than PR #66 (failed after 6 attempts) but not "passes on first attempt" per Dev's success criterion. Root-cause analysis: the test (`test_integration.py:328`) creates `/testshare/Documents/Bambu Studio` at runtime via the API. The synoindex calls in `setup_dsm.py` execute at golden-image build time, before this directory exists, so they can't pre-index it. They prime indexer state for the parent (`/testshare/Documents`), which probably explains the improvement vs. PR #66, but the fundamental bug — DSM indexer latency for runtime-created directories — isn't addressed. Outcome: QA Failed — see review comment for finding 1 and the two paths forward (move synoindex to test-fixture time, or formally accept partial fix). `uv run pytest` 518/518 pass; required CI all green. |
QA round-1 finding A on PR #67. The setup_dsm.py synoindex call at golden-image build time helps warm DSM's indexer for static test data, but the test creates `${folder}/${keyword} Studio` (a runtime subdirectory) — synoindex at build time can't index a path that doesn't yet exist. Result on round 1: test improved from 6-attempt fail to 4-attempt success. Still flaky. Root-cause fix: register the runtime-created path with DSM's search index synchronously, right after creation, before the test searches. Three changes: 1. New shared SSH helper at tests/vdsm/ssh.py — lifted from setup_dsm.py's _ssh / _SSH_OPTS so both image-build-time and test-fixture-time SSH share the same plumbing. setup_dsm.py now imports ssh_exec from the shared module. 2. New `refresh_search_index` async fixture in tests/test_integration.py — no-op by default. Real-NAS runs get the no-op (NAS has scheduled indexing); the test calls `await refresh_search_index( search_dir)` after creating the directory. 3. Override in tests/vdsm/conftest.py that calls `/usr/syno/bin/synoindex -A -d <path>` via SSH on the vdsm container. Translates the share-relative path the test uses (e.g. /testshare/Documents/Bambu Studio) into the on-volume path synoindex needs (e.g. /volume1/testshare/Documents/Bambu Studio). Best-effort — non-zero rc is logged but not raised; the test's existing retry loop is the safety net. 518 unit tests still pass. Live verification of the vdsm flake fix has to come from the next CI run on this PR — rebased + cache invalidation flows through cleanly via setup_dsm.py's already-changed hash.
cmeans
left a comment
There was a problem hiding this comment.
QA Round 2 — PR #67
Round-1 finding — verified closed on ff4281d
| Finding | Status | Evidence |
|---|---|---|
| F1 (incomplete fix — synoindex at build time can't index runtime-created dirs) | ✅ closed | Dev took option A. New refresh_search_index async fixture (no-op default in test_integration.py, vdsm override in tests/vdsm/conftest.py) calls synoindex -A -d <path> via SSH after the test creates the directory. Build-time synoindex calls retained for the static test data. SSH plumbing lifted to shared tests/vdsm/ssh.py. |
Live verification on the round-2 vdsm CI run (run 25003587048)
The cache miss path was exercised — golden image rebuilt with the round-2 fixture wiring. From the log:
15:31:07.45 — test_search_keyword_finds_directory STARTED
15:31:07.68 — Created search target directory: /testshare/Documents/Bambu Studio
15:31:07.84 — synoindex registered /volume1/testshare/Documents/Bambu Studio with DSM search index
15:31:10.92 — search_files attempt 1/6: 1 results found (Bambu Studio)
15:31:10.97 — PASSED
Test passed on attempt 1, ~3.5s after directory creation. Success criterion met.
Code-review highlights
tests/vdsm/ssh.py— clean extraction, properly documented, password sanitization viashlex.quote, sudo lecture filtering. Reusable for both image-build and test-fixture contexts.- vdsm fixture — translates share-relative path (
/testshare/...) to on-volume path (/volume1/testshare/...) which is whatsynoindexneeds. Wraps the syncssh_execviaasyncio.to_threadfor the async test. Best-effort: non-zero rc logs a warning, doesn't raise — test's existing retry loop is the safety net. - Default fixture — no-op for real-NAS runs (those NAS units have scheduled indexing; no need to force-register). No-regression for the real-NAS integration test path.
- Test signature — adds
refresh_search_index: Anyparameter; callsawait refresh_search_index(search_dir)after creation. Single call site, minimal blast radius.
Verification on ff4281d
| Check | Result |
|---|---|
uv run pytest |
518 passed, 94 deselected, 96.15% coverage (unchanged — fixture is integration-only, deselected on default runs) |
uv run ruff check src/ tests/ scripts/ |
clean |
uv run ruff format --check src/ tests/ scripts/ |
69 files clean |
uv run mypy src/ scripts/ |
clean (29 source files) |
| Python parses (ssh.py, conftest.py, setup_dsm.py, test_integration.py) | all parse |
| Required CI rollup | all green (lint, typecheck, test 3.11/3.12/3.13, version-sync, validate-server-json) |
| vdsm CI on round 2 | all green — 47/47 vdsm tests pass; test_search_keyword_finds_directory passes on attempt 1 (live evidence above) |
| Cache hashFiles invalidation | confirmed working — round-2 CI was a cache miss, rebuilt the image with new fixtures baked in; new cache saved as c508ffec… |
CHANGELOG entry
Expanded to describe both parts of the two-part fix accurately. Specifically calls out the round-1 → round-2 progression (6-attempt fail → 4-attempt success → attempt-1 success) and notes that the build-time synoindex calls are retained alongside the test-fixture-time addition.
Verdict
Ready for QA Signoff. F1 closed by Dev's option A (root-cause fix). Live CI proved the success criterion on the first run. No findings. QA Approved remains the maintainer's call.
QA audit — round 2Branch / SHA: `fix/vdsm-search-flake-via-synoindex` @ `ff4281d` Live vdsm CI evidence (run 25003587048, cache miss → fresh build): ``` Test passed on attempt 1, ~3.5s after directory creation — success criterion met. 47/47 vdsm tests passed overall. Local stack: Outcome: Ready for QA Signoff. F1 closed by option A (test-fixture-time synoindex). `QA Approved` remains the maintainer's call. |
Status: round 2 — fixes QA's finding A (root-cause fix, not partial)
QA round 1 verdict: round-1 head shipped a partial fix (synoindex at golden-image-build time improved the test from 6-attempt fail to 4-attempt success), but didn't address the actual root cause — the test creates
Bambu Studioat runtime, after the golden image is built, so build-timesynoindexcan't index a path that doesn't yet exist.Round 1 → Round 2 deltas
QA's option A (test-fixture-time synoindex) implemented:
tests/vdsm/ssh.py— lifted fromsetup_dsm.py's private_ssh/_SSH_OPTSso both image-build-time and test-fixture-time SSH go through the same plumbing.setup_dsm.pynow importsssh_execfrom the shared module.refresh_search_indexasync fixture intests/test_integration.py— no-op by default for real-NAS runs (NAS has scheduled indexing). The test callsawait refresh_search_index(search_dir)after creating the directory.tests/vdsm/conftest.py— invokes/usr/syno/bin/synoindex -A -d <path>via SSH on the vdsm container, translating the share-relative path the test uses (e.g./testshare/Documents/Bambu Studio) into the on-volume path synoindex needs (e.g./volume1/testshare/Documents/Bambu Studio). Best-effort — non-zero rc is logged but not raised; the test's retry loop is the safety net.The build-time
synoindexcalls insetup_dsm.pyfrom round 1 are kept — they still help with the static test data (Documents/,Media/) which exists at image build time.Why this should hit the success criterion
The failure mode in round 1 was: even with build-time indexing of the
Documents/parent, DSM's incremental indexer takes ~20-40s to register newly-created subdirectories. Round 2 callssynoindex -A -d <new-path>synchronously after the test creates the subdirectory, registering it before the search call. Search should succeed on attempt 1.If something unexpected happens, the test's existing 6-attempt retry loop is unchanged and falls back to the round-1 behavior (no regression vs current).
Verification gate
I can't run synoindex locally — the live verification is "next vdsm CI run on this PR shows
test_search_keyword_finds_directorypassing on attempt 1, not attempt 4." If round 2 also doesn't hit attempt 1, we iterate.Test plan
python -c "import ast; ast.parse(open('tests/vdsm/ssh.py').read()); ast.parse(open('tests/vdsm/conftest.py').read()); ast.parse(open('tests/test_integration.py').read())"— all parseuv run pytest— 518 passed (unchanged from round 1), 96.15% coverage. The newrefresh_search_indexno-op fixture exists intest_integration.pybut@pytest.mark.integrationtests are deselected on default runs, so no behavioral change to the unit suite.uv run ruff check src/ tests/ scripts/— cleanuv run ruff format --check src/ tests/ scripts/— cleanuv run mypy src/ scripts/— cleantest_search_keyword_finds_directoryon attempt 1.Files
tests/vdsm/ssh.pyssh_exechelpertests/vdsm/setup_dsm.pyssh.py; build-time synoindex unchangedtests/vdsm/conftest.pyrefresh_search_indexfixture (vdsm override)tests/test_integration.pyrefresh_search_indexno-op fixture; test calls it aftercreate_folderCHANGELOG.md