fix(server): bound bg update-check executor + log swallowed errors#81
Conversation
Closes #39. Two gaps in SharedClientManager._bg_update_check(): 1. Silent swallow: the (OSError, ValueError, KeyError) handler was a bare `pass`. PyPI down, DNS failure, malformed version string, state-file KeyError — all left no breadcrumb, even with SYNOLOGY_LOG_LEVEL=debug. Now logs the actual exception at DEBUG with exc_info=True. Still doesn't propagate — update check is best-effort and the tool flow runs fine without a notice. 2. Unbounded executor: loop.run_in_executor() had no outer time bound. The inner urlopen(timeout=5) covered the socket but a thread stuck after urlopen returned (pathological YAML parsing, slow disk in _save_global_state) would keep the coroutine alive forever and delay session shutdown. Now wrapped in asyncio.timeout(10); generous 10s outer bound with TimeoutError caught and logged at DEBUG. Tests: - Strengthened test_bg_update_check_swallows_errors to assert the new "Update check failed" DEBUG record carries exc_info. - New test_bg_update_check_timeout_logged_and_swallowed patches asyncio.timeout to a 50ms window and run_in_executor to a 5s sleep; asserts the coroutine returns normally with the expected DEBUG log. 554 unit tests pass at 96.13% coverage; ruff/mypy clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
cmeans
left a comment
There was a problem hiding this comment.
QA Round 1 — One blocker
The actual code change closes #39 cleanly: asyncio.timeout(10) outer-bound on run_in_executor, except TimeoutError branch logs at DEBUG with exc_info=True, the (OSError, ValueError, KeyError) handler now logs instead of bare-passing. New test_bg_update_check_timeout_logged_and_swallowed patches asyncio.timeout to 50ms against a 5s sleeping executor and asserts clean return + DEBUG log — the right shape. Strengthened test_bg_update_check_swallows_errors asserts exc_info is non-None on the new DEBUG record. All 5 #39 acceptance criteria satisfied at the code/test level.
But the CHANGELOG diff broke the #80 entry. Detail in F1.
Verification
| Check | Result |
|---|---|
uv run pytest |
554 passed (+1 new), 100 deselected, 96.13% coverage |
Targeted test_bg_update_check_swallows_errors + test_bg_update_check_timeout_logged_and_swallowed |
both pass directly |
uv run ruff check src/ tests/ scripts/ |
clean |
uv run ruff format --check src/ tests/ scripts/ |
72 files already formatted |
uv run mypy src/ scripts/ |
clean (30 files, strict-mode) |
Required CI on 84f62257 |
12/13 green; vdsm IN_PROGRESS (will check before signoff) |
Findings
| ID | Finding |
|---|---|
| F1 (substantive — CHANGELOG corruption) | This PR regresses the #80 CHANGELOG entry. On origin/main the #80 entry is the full multi-sentence description (closes #38. \core/auth.py:147-148` previously caught a bare `except Exception:` …, ~3KB of explanatory prose ending at …the new INFO-level remediation hint contains the socket path AND all three remediation strings. 553 unit tests pass at 96.13% coverage.). On this branch (84f6225) the #80 entry has been truncated to just - Keyring exception handler narrows + logs root cause (#80) — closes #38., and the *full original #80 description* has been concatenated to the **end** of the new #81 entry — starting from …554 unit tests pass at 96.13% coverage. `core/auth.py:147-148` previously caught…and running through to a duplicate trailing…553 unit tests pass at 96.13% coverage.Effect: (a) the merged #80 entry loses its description on this branch, (b) the new #81 entry has ~3KB of unrelated tail content. Almost certainly a botched conflict resolution from the earlier rebase mid-flight on shared workspace state — whenfix/keyring-handler-narrowwas rebased onto post-#79 main the CHANGELOG conflict got resolved cleanly there, but the rebase artifact propagated into this branch. Confirmed bygit show origin/main:CHANGELOG.md: the #80 entry on main is the full description; this branch alone is corrupt. **Fix:** restore the #80 entry to its origin/main` form (full description) and trim the trailing #80 content from the end of the #81 entry. Two-step edit on CHANGELOG.md only — no code change. Per the project's CHANGELOG-per-PR + doc-drift-is-substantive discipline, blocking. |
Issue #39 acceptance criteria
| AC | Coverage |
|---|---|
Log caught exception with logger.debug + exc_info=True |
✓ logger.debug("Update check failed", exc_info=True) |
Wrap run_in_executor with asyncio.timeout(10) |
✓ async with asyncio.timeout(10): |
| On timeout, log at DEBUG, never propagate | ✓ except TimeoutError: logger.debug("Update check timed out…", exc_info=True) (the comment correctly notes 3.11+ aliases asyncio.TimeoutError to builtins.TimeoutError, which is why the bare except TimeoutError form works — project's test matrix is 3.11/3.12/3.13) |
| Unit test: slow executor cancelled after timeout, coroutine returns normally | ✓ test_bg_update_check_timeout_logged_and_swallowed |
CHANGELOG ### Fixed entry |
⚠ See F1 — entry exists but the diff also corrupted the #80 entry |
Notes (not findings)
- Held off flipping the PR-body manual-test checkboxes until F1 is fixed and I re-verify the CHANGELOG state. Boxes will go in round 2.
- Will re-confirm vdsm CI is green before signoff in round 2.
Verdict
QA Failed Round 1. Code is solid; F1 is purely on CHANGELOG — restore #80 entry from origin/main and trim the spurious tail from #81's entry. Round 2 should be quick.
|
Applying QA Failed as the final act of round 1. Single blocker F1: the CHANGELOG diff regressed the #80 entry (truncated to "closes #38.") and tacked the full #80 description onto the end of #81's entry. Verified on |
Addresses QA F1 on PR #81. An earlier Edit on this branch used a prefix of the #80 entry as `old_string`, which silently truncated the rest of #80's description AND concatenated it to the end of the new #81 entry. Net effect was the merged #80 entry on this branch read just "closes #38." while #81 had ~3KB of unrelated tail. Restored #80 to its origin/main form (full multi-sentence description) and trimmed the spurious tail from #81. `git diff origin/main -- CHANGELOG.md` now shows exactly one new line — the intended #81 entry. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Round 2 — F1 addressed (CHANGELOG restoration)You diagnosed it correctly: an earlier Edit on this branch used a prefix of the #80 entry as the search string, which silently truncated the rest of #80's description AND concatenated it to the end of the new #81 entry. Effect was exactly what you described — #80 on this branch read just "closes #38." while #81 had ~3KB of spurious tail. Commit No code or test changes — code remains as approved on round 1 (commit Lesson saved for future Edits on long CHANGELOG entries: use the entire entry-line as the Ready for round 2. |
cmeans
left a comment
There was a problem hiding this comment.
QA Round 2 — PASS
F1 closed by 8ee296a. The #80 CHANGELOG entry is restored byte-for-byte from origin/main (verified via diff <(git show origin/main:CHANGELOG.md | grep '^- \*\*Keyring') <(git show HEAD:CHANGELOG.md | grep '^- \*\*Keyring') — only the line number shifted by 1 because of the new #81 entry above it; content matches). The #81 entry is trimmed to just its own content; no spurious #80 tail. git diff origin/main -- CHANGELOG.md now shows exactly one new line (the #81 entry) plus the line-number shift on a downstream "Keyring service" 0.5.0 entry.
Bonus: the commit message diagnoses the original failure mode explicitly ("an earlier Edit on this branch used a prefix of the #80 entry as old_string, which silently truncated the rest"). Useful audit trail for the next contributor who hits a similar Edit pitfall.
Verification (re-run on 8ee296a)
| Check | Result |
|---|---|
uv run pytest |
554 passed, 100 deselected, 96.13% coverage |
uv run ruff check src/ tests/ scripts/ |
clean |
uv run ruff format --check src/ tests/ scripts/ |
72 files already formatted |
uv run mypy src/ scripts/ |
clean (30 files, strict-mode) |
Required CI on 8ee296a |
13/13 green (vdsm completed SUCCESS) |
git show origin/main:CHANGELOG.md | grep #80 vs HEAD:CHANGELOG.md | grep #80 |
identical content (line shift only) |
PR-body manual tests 1–8 now flipped to checked.
Verdict
Ready for QA Signoff. F1 closed; all 5 #39 acceptance criteria still satisfied; CHANGELOG is back in shape. Final maintainer call.
## Summary Cuts v0.5.2, shipping six PRs that landed since v0.5.1 (2026-05-01): - **#79** mcp-publisher v1.5.0 → v1.7.6 (registry OIDC audience fix) - **#80** keyring exception handler narrowing (closes #38) - **#81** bg update-check executor timeout + log (closes #39) - **#82** pygments 2.19.2 → 2.20.0 (GHSA-5239-wwwm-4pmq, ReDoS) - **#83** `--revert` version-string validation (closes #40) - **#85** per-path serial for `move_files` + `copy_files` (closes #84) ## Why now Two recent bug fixes (#83, #85) are user-visible enough to warrant shipping, and #84 in particular is a confusing silent-no-op regression on multi-file moves — getting that to PyPI promptly matters. The four post-0.5.1 quality fixes (#79–#82) are stacked behind it. This release also exercises **#79's mcp-publisher v1.7.6 pin end-to-end** so the registry can catch up to current. v0.5.1's registry entry is missing because #79 landed AFTER the v0.5.1 tag-push, and `actions/checkout@v6` resolved to the tag's commit on re-runs of the failed `publish-registry` job — the fix wasn't picked up. The v0.5.2 tag will pull the correct pin from main. ## State after merge Bug-labeled issue queue is empty. The structural multi-path-serial fix family (delete + getinfo + move + copy + restore) is now complete on real DSM 7.x — every File Station write tool that takes a `paths: list[str]` issues one DSM task per path, sidestepping the comma-joined-multipath quirk that #68 and #84 each surfaced. ## Files changed - `pyproject.toml` — version 0.5.1 → 0.5.2 - `server.json` — auto-synced via `python scripts/sync-server-json.py` - `uv.lock` — refreshed via `uv lock` - `CHANGELOG.md` — `## Unreleased` (with the six entries above) renamed to `## 0.5.2 (2026-05-01)`, fresh empty `## Unreleased` inserted above it for the next cycle ## Test plan - [x] CI green on this branch (lint, typecheck, test 3.11/3.12/3.13, vdsm integration tests, version-sync, validate-server-json) - [ ] After merge: tag `v0.5.2` push fires `publish.yml`; PyPI publish succeeds - [ ] After merge: `publish-registry` job succeeds end-to-end (this is the validation point for #79's fix — the failure mode in v0.5.1 was `invalid audience: expected https://registry.modelcontextprotocol.io, got [mcp-registry]`) - [ ] After tag: `mcp-synology --check-update` from a v0.5.1 install reports v0.5.2 available; `uv tool install mcp-synology@latest` upgrades cleanly - [ ] Smoke (post-install): two-file `move_files` actually moves both files (the #84 regression scenario) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: cmeans-claude-dev[bot] <272174644+cmeans-claude-dev[bot]@users.noreply.github.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
Closes #39. Two gaps in
server.py::SharedClientManager._bg_update_check().1. Silent exception swallowing
The
(OSError, ValueError, KeyError)handler was a barepass. A failed PyPI check (DNS down, malformed version string inglobal.yaml, state-fileKeyError) silently exited the coroutine with no breadcrumb anywhere — operators runningSYNOLOGY_LOG_LEVEL=debugsaw nothing. Now logs the actual exception at DEBUG withexc_info=True. Still doesn't propagate — update check is best-effort and the tool flow runs fine without a notice.2. Unbounded executor
loop.run_in_executor()had no outer time bound. The innerurlopen(timeout=5)covered the socket, but a thread stuck afterurlopenreturned (pathological YAML parsing on a malformed PyPI response, slow disk on_save_global_state()) would keep the coroutine alive forever and delay session shutdown.Now wrapped in
asyncio.timeout(10). 10s outer bound is generous over the 5s socket timeout, leaving room for_check_for_update's post-urlopenwork.TimeoutErrorcaught separately and logged at DEBUG.Diff at a glance
```python
Before
latest = await loop.run_in_executor(None, _check_for_update, gstate)
...
except (OSError, ValueError, KeyError):
pass # Never let update check break tool functionality
After
async with asyncio.timeout(10):
latest = await loop.run_in_executor(None, _check_for_update, gstate)
...
except TimeoutError:
logger.debug("Update check timed out after 10s; skipping notice", exc_info=True)
except (OSError, ValueError, KeyError):
logger.debug("Update check failed", exc_info=True)
```
Tests
test_bg_update_check_swallows_errors— asserts the new "Update check failed" DEBUG record carriesexc_info. Pre-fix this branch was a barepassand left no record at all.test_bg_update_check_timeout_logged_and_swallowed— patchesasyncio.timeoutto a 50ms window andrun_in_executorto a 5s sleep, asserts the coroutine returns normally with the expected DEBUG log. Without the fix this test would hang for the full 5s and never see the timeout log.Both new tests run in milliseconds.
QA
Manual tests
uv run pytest— 554 passed at 96.13% coverage.uv run pytest tests/core/test_server.py::TestSharedClientManagerLifecycle::test_bg_update_check_swallows_errors tests/core/test_server.py::TestSharedClientManagerLifecycle::test_bg_update_check_timeout_logged_and_swallowed -v— both pass.uv run ruff check src/ tests/ scripts/— clean.uv run ruff format --check src/ tests/ scripts/— clean.uv run mypy src/ scripts/— clean.git grep -nE 'pass.*Never let update check' src/— no matches (the bare-pass placeholder is gone).git grep -n 'asyncio.timeout' src/mcp_synology/server.py— exactly one match in_bg_update_check.async with asyncio.timeout(10):wrapper, runtest_bg_update_check_timeout_logged_and_swallowed. It should hang or fail (the test patches the timeout to 50ms and the executor sleeps 5s — without the wrapper, the coroutine awaits 5s and never sees the DEBUG log). Restore.Verification I already ran
uv run pytestuv run ruff check src/ tests/ scripts/uv run ruff format --check src/ tests/ scripts/uv run mypy src/ scripts/🤖 Generated with Claude Code