diff --git a/CHANGELOG.md b/CHANGELOG.md index b5960d9..0008a58 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -6,6 +6,7 @@ - **publish.yml: bump pinned `mcp-publisher` v1.5.0 → v1.7.6 to match the new registry OIDC audience** (#79) — the v0.5.1 release ran with `mcp-publisher v1.5.0` (the pin in `.github/actions/install-mcp-publisher/action.yml`); PyPI publish succeeded but the `publish-registry` job failed at GitHub OIDC login with `invalid audience: expected https://registry.modelcontextprotocol.io, got [mcp-registry]` (HTTP 401). Root cause: the registry deployed [`modelcontextprotocol/registry#1229`](https://github.com/modelcontextprotocol/registry/pull/1229) ("auth: bind GitHub OIDC token exchange to a per-deployment audience") in `v1.7.6` on 2026-04-30 — one day before our 2026-05-01 release. v1.5.0's `login github-oidc` flow sends audience `mcp-registry`; v1.7.6's flow sends audience `https://registry.modelcontextprotocol.io`, which is what the new registry server validates against. Bumped the action's `default` from `v1.5.0` to `v1.7.6` (and added an explanatory comment so the next bump prompt has the rationale at hand). Re-running the failed `publish-registry` job on the existing v0.5.1 tag won't pick up this fix because `actions/checkout@v6` resolves to the tag's commit; the next release tag will exercise the fix end-to-end. v0.5.1 itself is on PyPI as expected and is the install path users actually hit; the missed registry entry is purely directory metadata. - **Keyring exception handler narrows + logs root cause** (#80) — closes #38. `core/auth.py:147-148` previously caught a bare `except Exception:` with a flat `logger.debug("Keyring not available.")`, hiding every keyring failure mode behind one generic line: locked macOS keychain (operator-actionable: unlock the keychain), `keyring.errors.NoKeyringError` on a headless host (signals a config issue), `keyring.errors.InitError`, OS-level errors on the D-Bus reach path, and genuine library bugs. Operators running `mcp-synology check -v` saw "Keyring not available." with no clue whether the keychain was locked, the backend was missing, or the library blew up. Narrowed to two typed handlers: `except KeyringError as e` (the typed-error case — covers `KeyringLocked`, `NoKeyringError`, `InitError`, `PasswordSetError`, `PasswordDeleteError`, and any other `keyring.errors.*` class) and `except OSError as e` (D-Bus reach errors, permission failures on the OS keychain DB). Both log at DEBUG with `exc_info=True` so the actual exception type, message, and traceback land in the verbose-mode output. Genuine bugs are no longer caught — they propagate up so they can be triaged. The pre-keyring D-Bus socket pre-check (`auth.py:130-131`) was bumped from DEBUG to INFO and rewritten with three concrete remediations (`mcp-synology setup` from a real desktop session, `dbus-run-session` wrapper, or `SYNOLOGY_USERNAME` / `SYNOLOGY_PASSWORD` / `SYNOLOGY_DEVICE_ID` env vars to bypass keyring entirely) — this is the most common path on Linux services launched without a desktop session, and the previous DEBUG-only log meant operators running `mcp-synology check` (without `-v`) saw a generic "no credentials" error with no breadcrumb to the root cause. Updated `tests/core/test_auth.py::_no_keyring()` fixture to raise `keyring.errors.NoKeyringError` instead of bare `Exception` so existing tests exercise the production-shaped error path. New `TestKeyringErrorHandling` (3 cases): `KeyringError` logged with `exc_info` and message text at DEBUG, `OSError` logged separately, and a defense-in-depth case proving a keyring blow-up doesn't block credential resolution from config/env. Strengthened `TestDbusSocketMissing::test_dbus_not_set_when_socket_missing_on_linux` to assert the new INFO-level remediation hint contains the socket path AND all three remediation strings. 553 unit tests pass at 96.13% coverage. +- **Background update-check task: log swallowed exceptions and bound the executor** (#81) — closes #39. Two related gaps in `server.py::SharedClientManager._bg_update_check()`. (1) The `(OSError, ValueError, KeyError)` handler used a bare `pass`, so a failed PyPI check (DNS down, malformed version string in `global.yaml`, state-file `KeyError`) silently exited the background coroutine with no breadcrumb anywhere — `SYNOLOGY_LOG_LEVEL=debug` showed nothing. Now logs at DEBUG with `exc_info=True` so the actual exception type, message, and traceback land in verbose output. Update check is best-effort, so we still don't propagate; the tool flow runs fine without an update notice. (2) The `loop.run_in_executor()` call was unbounded. The inner `urlopen(timeout=5)` covered the socket, but a thread stuck after `urlopen` returned (pathological YAML parsing on a malformed PyPI response, slow disk on `_save_global_state()`) would keep this coroutine alive forever and could delay session shutdown. Now wrapped in `asyncio.timeout(10)` (10s outer bound, generous margin over the 5s socket timeout); on timeout `_bg_update_check` catches `TimeoutError` separately, logs at DEBUG, and exits cleanly so the user-facing tool flow is never blocked. New `TestSharedClientManagerLifecycle::test_bg_update_check_timeout_logged_and_swallowed` patches `asyncio.timeout` to a 50ms window and `run_in_executor` to a 5s sleep, asserting the coroutine returns normally with the expected DEBUG log. Strengthened `test_bg_update_check_swallows_errors` to assert the new "Update check failed" DEBUG record carries `exc_info`. 554 unit tests pass at 96.13% coverage. ## 0.5.1 (2026-05-01) diff --git a/src/mcp_synology/server.py b/src/mcp_synology/server.py index 5653eed..761664d 100644 --- a/src/mcp_synology/server.py +++ b/src/mcp_synology/server.py @@ -179,7 +179,24 @@ async def _logout() -> None: asyncio.run(_logout()) async def _bg_update_check(self) -> None: - """Background update check — appends notice on first tool result.""" + """Background update check — appends notice on first tool result. + + Closes #39. Two pre-fix gaps: + + 1. The `(OSError, ValueError, KeyError)` handler used a bare `pass`, + so a failed PyPI check (DNS down, malformed version string, + state-file KeyError) left no breadcrumb anywhere — even with + `SYNOLOGY_LOG_LEVEL=debug`. Now logs at DEBUG with `exc_info=True` + so the actual exception type, message, and traceback land in + verbose output. + + 2. The `run_in_executor()` call was unbounded. The inner + `urlopen(timeout=5)` covered the socket, but a thread stuck after + `urlopen` returned (pathological YAML parsing, slow disk) would + keep this coroutine alive forever and delay session shutdown. + Now wrapped in `asyncio.timeout(10)`; on timeout we log at DEBUG + and exit cleanly so the tool flow is never blocked. + """ try: from mcp_synology.cli import ( _check_for_update, @@ -189,8 +206,13 @@ async def _bg_update_check(self) -> None: loop = asyncio.get_running_loop() gstate = _load_global_state() - # Run the blocking PyPI check in a thread - latest = await loop.run_in_executor(None, _check_for_update, gstate) + # Run the blocking PyPI check in a thread, bounded so a stuck + # executor (slow socket, slow YAML parse, hung disk) can't + # keep this coroutine alive forever. The inner urlopen has + # its own 5s timeout; the 10s outer bound is a safety margin + # for everything else _check_for_update does. + async with asyncio.timeout(10): + latest = await loop.run_in_executor(None, _check_for_update, gstate) _save_global_state(gstate) if latest: from mcp_synology import __version__ @@ -200,8 +222,17 @@ async def _bg_update_check(self) -> None: f"(current: {__version__}). " f"Run: mcp-synology --check-update" ) + except TimeoutError: + # asyncio.timeout raises builtins.TimeoutError on 3.11+. Treat + # as a non-event for the user-facing flow — they just don't + # get an "update available" notice this session. + logger.debug("Update check timed out after 10s; skipping notice", exc_info=True) except (OSError, ValueError, KeyError): - pass # Never let update check break tool functionality + # Closes #39: surfaces the actual exception under + # `SYNOLOGY_LOG_LEVEL=debug`. Update check is best-effort, so + # we still don't propagate — the tool flow runs fine without + # an update notice. + logger.debug("Update check failed", exc_info=True) def create_server(config: AppConfig) -> FastMCP: diff --git a/tests/core/test_server.py b/tests/core/test_server.py index 6c51fca..a62393a 100644 --- a/tests/core/test_server.py +++ b/tests/core/test_server.py @@ -2,7 +2,8 @@ from __future__ import annotations -from typing import TYPE_CHECKING +import asyncio +from typing import TYPE_CHECKING, Any from unittest.mock import AsyncMock, MagicMock, patch import pytest @@ -401,15 +402,100 @@ async def test_bg_update_check_no_update(self) -> None: await manager._bg_update_check() assert manager._update_notice is None - async def test_bg_update_check_swallows_errors(self) -> None: - """OSError/ValueError/KeyError during update check must not raise.""" + async def test_bg_update_check_swallows_errors(self, caplog: pytest.LogCaptureFixture) -> None: + """OSError/ValueError/KeyError during update check must not raise, + AND the actual exception must be logged at DEBUG with exc_info so + `SYNOLOGY_LOG_LEVEL=debug` surfaces the cause (closes #39, half 1). + """ + import logging + config = make_test_config(check_for_updates=True) manager = SharedClientManager(config) with ( patch("mcp_synology.cli._load_global_state", side_effect=OSError("disk full")), + caplog.at_level(logging.DEBUG, logger="mcp_synology.server"), ): await manager._bg_update_check() # must not raise assert manager._update_notice is None + # The actual exception is logged at DEBUG with exc_info — pre-#39 + # this branch was a bare `pass` and left no breadcrumb. + matching = [ + r + for r in caplog.records + if r.levelname == "DEBUG" and "Update check failed" in r.getMessage() + ] + assert matching, ( + f"expected DEBUG log on update-check failure, got: " + f"{[r.getMessage() for r in caplog.records]}" + ) + assert matching[0].exc_info is not None, ( + "expected exc_info traceback on the update-check failure log" + ) + + async def test_bg_update_check_timeout_logged_and_swallowed( + self, caplog: pytest.LogCaptureFixture + ) -> None: + """A stuck executor task is bounded by asyncio.timeout(10); the + coroutine returns normally with a DEBUG log noting the timeout. + Closes #39 (half 2). Pre-fix, an executor that hung after the + urlopen 5s socket timeout (e.g. pathological YAML parsing, slow + disk) would keep this coroutine alive indefinitely. + """ + import logging + + config = make_test_config(check_for_updates=True) + manager = SharedClientManager(config) + + # Simulate a hung executor by patching asyncio.timeout to raise + # TimeoutError immediately (the inner await still runs but the + # context manager is what enforces the bound; raising on enter + # exercises the same except-arm a real timeout would). + async def _slow_check(*args: object, **kwargs: object) -> None: + # Use a sleep longer than the 10s timeout. We patch + # asyncio.timeout to use a tiny window so this test is fast. + await asyncio.sleep(5) + + # Patch the inner executor call to a slow coroutine, AND tighten + # the timeout window to make the test fast. Easier path: patch + # `loop.run_in_executor` to return a never-completing future and + # reduce the timeout to ~0.05s so the test completes quickly. + original_timeout = asyncio.timeout + + def _quick_timeout(_seconds: float) -> Any: + # Force a tight 50ms window regardless of caller's argument so + # the test isn't waiting 10s of real time. + return original_timeout(0.05) + + async def _stuck_executor(*args: object, **kwargs: object) -> str: + await asyncio.sleep(5) + return "9.9.9" + + async def _stub_run_in_executor(*args: object, **kwargs: object) -> str: + return await _stuck_executor() + + with ( + patch("mcp_synology.cli._load_global_state", return_value={}), + patch("mcp_synology.cli._save_global_state"), + patch("asyncio.timeout", side_effect=_quick_timeout), + patch.object( + asyncio.get_event_loop(), + "run_in_executor", + side_effect=_stub_run_in_executor, + ), + caplog.at_level(logging.DEBUG, logger="mcp_synology.server"), + ): + await manager._bg_update_check() # must not raise + + assert manager._update_notice is None # no notice on timeout + matching = [ + r + for r in caplog.records + if r.levelname == "DEBUG" and "Update check timed out" in r.getMessage() + ] + assert matching, ( + f"expected DEBUG log on update-check timeout, got: " + f"{[r.getMessage() for r in caplog.records]}" + ) class TestSharedClientManagerSubscribeOnReauth: