Skip to content

fix(server): bound bg update-check executor + log swallowed errors#81

Merged
cmeans-claude-dev[bot] merged 4 commits into
mainfrom
fix/bg-update-check-timeout-and-log
May 1, 2026
Merged

fix(server): bound bg update-check executor + log swallowed errors#81
cmeans-claude-dev[bot] merged 4 commits into
mainfrom
fix/bg-update-check-timeout-and-log

Conversation

@cmeans-claude-dev
Copy link
Copy Markdown
Contributor

@cmeans-claude-dev cmeans-claude-dev Bot commented May 1, 2026

Summary

Closes #39. Two gaps in server.py::SharedClientManager._bg_update_check().

1. Silent exception swallowing

The (OSError, ValueError, KeyError) handler was a bare pass. A failed PyPI check (DNS down, malformed version string in global.yaml, state-file KeyError) silently exited the coroutine with no breadcrumb anywhere — operators running SYNOLOGY_LOG_LEVEL=debug saw nothing. Now logs the actual exception at DEBUG with exc_info=True. Still doesn't propagate — update check is best-effort and the tool flow runs fine without a notice.

2. Unbounded executor

loop.run_in_executor() had no outer time bound. The inner urlopen(timeout=5) covered the socket, but a thread stuck after urlopen returned (pathological YAML parsing on a malformed PyPI response, slow disk on _save_global_state()) would keep the coroutine alive forever and delay session shutdown.

Now wrapped in asyncio.timeout(10). 10s outer bound is generous over the 5s socket timeout, leaving room for _check_for_update's post-urlopen work. TimeoutError caught separately and logged at DEBUG.

Diff at a glance

```python

Before

latest = await loop.run_in_executor(None, _check_for_update, gstate)
...
except (OSError, ValueError, KeyError):
pass # Never let update check break tool functionality

After

async with asyncio.timeout(10):
latest = await loop.run_in_executor(None, _check_for_update, gstate)
...
except TimeoutError:
logger.debug("Update check timed out after 10s; skipping notice", exc_info=True)
except (OSError, ValueError, KeyError):
logger.debug("Update check failed", exc_info=True)
```

Tests

  • Strengthened test_bg_update_check_swallows_errors — asserts the new "Update check failed" DEBUG record carries exc_info. Pre-fix this branch was a bare pass and left no record at all.
  • New test_bg_update_check_timeout_logged_and_swallowed — patches asyncio.timeout to a 50ms window and run_in_executor to a 5s sleep, asserts the coroutine returns normally with the expected DEBUG log. Without the fix this test would hang for the full 5s and never see the timeout log.

Both new tests run in milliseconds.

QA

Manual tests

    • Run uv run pytest — 554 passed at 96.13% coverage.
    • Run uv run pytest tests/core/test_server.py::TestSharedClientManagerLifecycle::test_bg_update_check_swallows_errors tests/core/test_server.py::TestSharedClientManagerLifecycle::test_bg_update_check_timeout_logged_and_swallowed -v — both pass.
    • Run uv run ruff check src/ tests/ scripts/ — clean.
    • Run uv run ruff format --check src/ tests/ scripts/ — clean.
    • Run uv run mypy src/ scripts/ — clean.
    • Spot-check git grep -nE 'pass.*Never let update check' src/ — no matches (the bare-pass placeholder is gone).
    • Spot-check git grep -n 'asyncio.timeout' src/mcp_synology/server.py — exactly one match in _bg_update_check.
    • Verify regression-detect: temporarily revert the async with asyncio.timeout(10): wrapper, run test_bg_update_check_timeout_logged_and_swallowed. It should hang or fail (the test patches the timeout to 50ms and the executor sleeps 5s — without the wrapper, the coroutine awaits 5s and never sees the DEBUG log). Restore.

Verification I already ran

Check Result
uv run pytest 554 passed, 100 deselected, 96.13% coverage
uv run ruff check src/ tests/ scripts/ clean
uv run ruff format --check src/ tests/ scripts/ 72 files already formatted
uv run mypy src/ scripts/ clean (30 files, strict-mode)

🤖 Generated with Claude Code

cmeans-claude-dev Bot and others added 2 commits May 1, 2026 10:53
Closes #39. Two gaps in SharedClientManager._bg_update_check():

1. Silent swallow: the (OSError, ValueError, KeyError) handler was a
   bare `pass`. PyPI down, DNS failure, malformed version string,
   state-file KeyError — all left no breadcrumb, even with
   SYNOLOGY_LOG_LEVEL=debug. Now logs the actual exception at DEBUG
   with exc_info=True. Still doesn't propagate — update check is
   best-effort and the tool flow runs fine without a notice.

2. Unbounded executor: loop.run_in_executor() had no outer time bound.
   The inner urlopen(timeout=5) covered the socket but a thread stuck
   after urlopen returned (pathological YAML parsing, slow disk in
   _save_global_state) would keep the coroutine alive forever and
   delay session shutdown. Now wrapped in asyncio.timeout(10);
   generous 10s outer bound with TimeoutError caught and logged at
   DEBUG.

Tests:
- Strengthened test_bg_update_check_swallows_errors to assert the
  new "Update check failed" DEBUG record carries exc_info.
- New test_bg_update_check_timeout_logged_and_swallowed patches
  asyncio.timeout to a 50ms window and run_in_executor to a 5s
  sleep; asserts the coroutine returns normally with the expected
  DEBUG log.

554 unit tests pass at 96.13% coverage; ruff/mypy clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions github-actions Bot added the Awaiting CI Dev complete, waiting for CI/Codecov to pass before QA label May 1, 2026
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions github-actions Bot added Ready for QA Dev work complete — QA can begin review and removed Awaiting CI Dev complete, waiting for CI/Codecov to pass before QA labels May 1, 2026
@codecov-commenter
Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

cmeans
cmeans previously approved these changes May 1, 2026
Copy link
Copy Markdown
Owner

@cmeans cmeans left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@cmeans cmeans added the QA Active QA is actively reviewing; Dev should not push changes label May 1, 2026
@github-actions github-actions Bot removed the Ready for QA Dev work complete — QA can begin review label May 1, 2026
Copy link
Copy Markdown
Owner

@cmeans cmeans left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

QA Round 1 — One blocker

The actual code change closes #39 cleanly: asyncio.timeout(10) outer-bound on run_in_executor, except TimeoutError branch logs at DEBUG with exc_info=True, the (OSError, ValueError, KeyError) handler now logs instead of bare-passing. New test_bg_update_check_timeout_logged_and_swallowed patches asyncio.timeout to 50ms against a 5s sleeping executor and asserts clean return + DEBUG log — the right shape. Strengthened test_bg_update_check_swallows_errors asserts exc_info is non-None on the new DEBUG record. All 5 #39 acceptance criteria satisfied at the code/test level.

But the CHANGELOG diff broke the #80 entry. Detail in F1.

Verification

Check Result
uv run pytest 554 passed (+1 new), 100 deselected, 96.13% coverage
Targeted test_bg_update_check_swallows_errors + test_bg_update_check_timeout_logged_and_swallowed both pass directly
uv run ruff check src/ tests/ scripts/ clean
uv run ruff format --check src/ tests/ scripts/ 72 files already formatted
uv run mypy src/ scripts/ clean (30 files, strict-mode)
Required CI on 84f62257 12/13 green; vdsm IN_PROGRESS (will check before signoff)

Findings

ID Finding
F1 (substantive — CHANGELOG corruption) This PR regresses the #80 CHANGELOG entry. On origin/main the #80 entry is the full multi-sentence description (closes #38. \core/auth.py:147-148` previously caught a bare `except Exception:` …, ~3KB of explanatory prose ending at …the new INFO-level remediation hint contains the socket path AND all three remediation strings. 553 unit tests pass at 96.13% coverage.). On this branch (84f6225) the #80 entry has been truncated to just - Keyring exception handler narrows + logs root cause (#80) — closes #38., and the *full original #80 description* has been concatenated to the **end** of the new #81 entry — starting from …554 unit tests pass at 96.13% coverage. `core/auth.py:147-148` previously caught…and running through to a duplicate trailing…553 unit tests pass at 96.13% coverage.Effect: (a) the merged #80 entry loses its description on this branch, (b) the new #81 entry has ~3KB of unrelated tail content. Almost certainly a botched conflict resolution from the earlier rebase mid-flight on shared workspace state — whenfix/keyring-handler-narrowwas rebased onto post-#79 main the CHANGELOG conflict got resolved cleanly there, but the rebase artifact propagated into this branch. Confirmed bygit show origin/main:CHANGELOG.md: the #80 entry on main is the full description; this branch alone is corrupt. **Fix:** restore the #80 entry to its origin/main` form (full description) and trim the trailing #80 content from the end of the #81 entry. Two-step edit on CHANGELOG.md only — no code change. Per the project's CHANGELOG-per-PR + doc-drift-is-substantive discipline, blocking.

Issue #39 acceptance criteria

AC Coverage
Log caught exception with logger.debug + exc_info=True logger.debug("Update check failed", exc_info=True)
Wrap run_in_executor with asyncio.timeout(10) async with asyncio.timeout(10):
On timeout, log at DEBUG, never propagate except TimeoutError: logger.debug("Update check timed out…", exc_info=True) (the comment correctly notes 3.11+ aliases asyncio.TimeoutError to builtins.TimeoutError, which is why the bare except TimeoutError form works — project's test matrix is 3.11/3.12/3.13)
Unit test: slow executor cancelled after timeout, coroutine returns normally test_bg_update_check_timeout_logged_and_swallowed
CHANGELOG ### Fixed entry ⚠ See F1 — entry exists but the diff also corrupted the #80 entry

Notes (not findings)

  • Held off flipping the PR-body manual-test checkboxes until F1 is fixed and I re-verify the CHANGELOG state. Boxes will go in round 2.
  • Will re-confirm vdsm CI is green before signoff in round 2.

Verdict

QA Failed Round 1. Code is solid; F1 is purely on CHANGELOG — restore #80 entry from origin/main and trim the spurious tail from #81's entry. Round 2 should be quick.

@cmeans
Copy link
Copy Markdown
Owner

cmeans commented May 1, 2026

Applying QA Failed as the final act of round 1. Single blocker F1: the CHANGELOG diff regressed the #80 entry (truncated to "closes #38.") and tacked the full #80 description onto the end of #81's entry. Verified on origin/main — the full #80 description is intact there; corruption is local to this branch. Almost certainly an artifact from the earlier rebase mid-flight. Code itself is solid: all 5 #39 ACs satisfied, 554 passed, 96.13% coverage, ruff/mypy clean. Round 2 = restore #80 entry to origin/main form + trim spurious tail from #81 entry. CHANGELOG.md only, no code change.

@cmeans cmeans added QA Failed QA found issues — needs dev attention and removed QA Active QA is actively reviewing; Dev should not push changes labels May 1, 2026
Addresses QA F1 on PR #81. An earlier Edit on this branch used a
prefix of the #80 entry as `old_string`, which silently truncated
the rest of #80's description AND concatenated it to the end of the
new #81 entry. Net effect was the merged #80 entry on this branch
read just "closes #38." while #81 had ~3KB of unrelated tail.

Restored #80 to its origin/main form (full multi-sentence
description) and trimmed the spurious tail from #81. `git diff
origin/main -- CHANGELOG.md` now shows exactly one new line — the
intended #81 entry.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions github-actions Bot added Awaiting CI Dev complete, waiting for CI/Codecov to pass before QA and removed QA Failed QA found issues — needs dev attention labels May 1, 2026
@cmeans-claude-dev
Copy link
Copy Markdown
Contributor Author

Round 2 — F1 addressed (CHANGELOG restoration)

You diagnosed it correctly: an earlier Edit on this branch used a prefix of the #80 entry as the search string, which silently truncated the rest of #80's description AND concatenated it to the end of the new #81 entry. Effect was exactly what you described — #80 on this branch read just "closes #38." while #81 had ~3KB of spurious tail.

Commit 8ee296a restores #80 to its origin/main form and trims the tail from #81. git diff origin/main -- CHANGELOG.md now shows exactly one new line — the intended #81 entry — and three entries under ## Unreleased > ### Fixed (#79, #80, #81), all at full length, in merge order.

No code or test changes — code remains as approved on round 1 (commit 84f62257). CI on the new head should match.

Lesson saved for future Edits on long CHANGELOG entries: use the entire entry-line as the old_string (or use a sufficiently unique anchor), not a prefix. The sneaky bit here was that the prefix was unique enough to match exactly once, so Edit succeeded silently while truncating ~3KB of tail.

Ready for round 2.

@github-actions github-actions Bot added Ready for QA Dev work complete — QA can begin review and removed Awaiting CI Dev complete, waiting for CI/Codecov to pass before QA labels May 1, 2026
Copy link
Copy Markdown
Owner

@cmeans cmeans left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@cmeans cmeans added the QA Active QA is actively reviewing; Dev should not push changes label May 1, 2026
@github-actions github-actions Bot removed the Ready for QA Dev work complete — QA can begin review label May 1, 2026
Copy link
Copy Markdown
Owner

@cmeans cmeans left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

QA Round 2 — PASS

F1 closed by 8ee296a. The #80 CHANGELOG entry is restored byte-for-byte from origin/main (verified via diff <(git show origin/main:CHANGELOG.md | grep '^- \*\*Keyring') <(git show HEAD:CHANGELOG.md | grep '^- \*\*Keyring') — only the line number shifted by 1 because of the new #81 entry above it; content matches). The #81 entry is trimmed to just its own content; no spurious #80 tail. git diff origin/main -- CHANGELOG.md now shows exactly one new line (the #81 entry) plus the line-number shift on a downstream "Keyring service" 0.5.0 entry.

Bonus: the commit message diagnoses the original failure mode explicitly ("an earlier Edit on this branch used a prefix of the #80 entry as old_string, which silently truncated the rest"). Useful audit trail for the next contributor who hits a similar Edit pitfall.

Verification (re-run on 8ee296a)

Check Result
uv run pytest 554 passed, 100 deselected, 96.13% coverage
uv run ruff check src/ tests/ scripts/ clean
uv run ruff format --check src/ tests/ scripts/ 72 files already formatted
uv run mypy src/ scripts/ clean (30 files, strict-mode)
Required CI on 8ee296a 13/13 green (vdsm completed SUCCESS)
git show origin/main:CHANGELOG.md | grep #80 vs HEAD:CHANGELOG.md | grep #80 identical content (line shift only)

PR-body manual tests 1–8 now flipped to checked.

Verdict

Ready for QA Signoff. F1 closed; all 5 #39 acceptance criteria still satisfied; CHANGELOG is back in shape. Final maintainer call.

@cmeans
Copy link
Copy Markdown
Owner

cmeans commented May 1, 2026

Applying Ready for QA Signoff as the final act of round 2. F1 closed by 8ee296a#80 entry restored byte-for-byte from origin/main, #81 entry trimmed to own content. 554/554 pass, 96.13% coverage, ruff/mypy clean, 13/13 required CI green. Final maintainer call.

@cmeans cmeans added Ready for QA Signoff QA passed — ready for maintainer final review and merge QA Approved Manual QA testing completed and passed and removed QA Active QA is actively reviewing; Dev should not push changes Ready for QA Signoff QA passed — ready for maintainer final review and merge labels May 1, 2026
@cmeans-claude-dev cmeans-claude-dev Bot merged commit 3ce7036 into main May 1, 2026
34 checks passed
@cmeans-claude-dev cmeans-claude-dev Bot deleted the fix/bg-update-check-timeout-and-log branch May 1, 2026 16:30
@cmeans-claude-dev cmeans-claude-dev Bot mentioned this pull request May 1, 2026
5 tasks
cmeans-claude-dev Bot added a commit that referenced this pull request May 2, 2026
## Summary

Cuts v0.5.2, shipping six PRs that landed since v0.5.1 (2026-05-01):

- **#79** mcp-publisher v1.5.0 → v1.7.6 (registry OIDC audience fix)
- **#80** keyring exception handler narrowing (closes #38)
- **#81** bg update-check executor timeout + log (closes #39)
- **#82** pygments 2.19.2 → 2.20.0 (GHSA-5239-wwwm-4pmq, ReDoS)
- **#83** `--revert` version-string validation (closes #40)
- **#85** per-path serial for `move_files` + `copy_files` (closes #84)

## Why now

Two recent bug fixes (#83, #85) are user-visible enough to warrant
shipping, and #84 in particular is a confusing silent-no-op regression
on multi-file moves — getting that to PyPI promptly matters. The four
post-0.5.1 quality fixes (#79#82) are stacked behind it.

This release also exercises **#79's mcp-publisher v1.7.6 pin
end-to-end** so the registry can catch up to current. v0.5.1's registry
entry is missing because #79 landed AFTER the v0.5.1 tag-push, and
`actions/checkout@v6` resolved to the tag's commit on re-runs of the
failed `publish-registry` job — the fix wasn't picked up. The v0.5.2 tag
will pull the correct pin from main.

## State after merge

Bug-labeled issue queue is empty. The structural multi-path-serial fix
family (delete + getinfo + move + copy + restore) is now complete on
real DSM 7.x — every File Station write tool that takes a `paths:
list[str]` issues one DSM task per path, sidestepping the
comma-joined-multipath quirk that #68 and #84 each surfaced.

## Files changed

- `pyproject.toml` — version 0.5.1 → 0.5.2
- `server.json` — auto-synced via `python scripts/sync-server-json.py`
- `uv.lock` — refreshed via `uv lock`
- `CHANGELOG.md` — `## Unreleased` (with the six entries above) renamed
to `## 0.5.2 (2026-05-01)`, fresh empty `## Unreleased` inserted above
it for the next cycle

## Test plan

- [x] CI green on this branch (lint, typecheck, test 3.11/3.12/3.13,
vdsm integration tests, version-sync, validate-server-json)
- [ ] After merge: tag `v0.5.2` push fires `publish.yml`; PyPI publish
succeeds
- [ ] After merge: `publish-registry` job succeeds end-to-end (this is
the validation point for #79's fix — the failure mode in v0.5.1 was
`invalid audience: expected https://registry.modelcontextprotocol.io,
got [mcp-registry]`)
- [ ] After tag: `mcp-synology --check-update` from a v0.5.1 install
reports v0.5.2 available; `uv tool install mcp-synology@latest` upgrades
cleanly
- [ ] Smoke (post-install): two-file `move_files` actually moves both
files (the #84 regression scenario)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: cmeans-claude-dev[bot] <272174644+cmeans-claude-dev[bot]@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

QA Approved Manual QA testing completed and passed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Background update-check task: log swallowed exceptions and add executor timeout

2 participants