Skip to content
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
46 changes: 46 additions & 0 deletions docs/reference/configuration-precedence.md
Original file line number Diff line number Diff line change
Expand Up @@ -141,6 +141,52 @@ entry:
`tests/unit/settings/test_precedence_chain.py`.
4. Document the new entry in this page's source matrix.

## Bridge-config snapshot pattern (hot-reloadable AppState fields)

For controller / service knobs that should be hot-reloadable but cost
too much to resolve through `ConfigResolver.get_*()` on every request,
the canonical pattern is a frozen Pydantic snapshot on `AppState`
populated at startup and hot-swapped by a settings subscriber on
operator-driven changes. Reference implementation:
`api.max_lifecycle_events_per_query` consumed by
`ActivityController.list_activities`.

The pattern has four pieces:

1. **Frozen bridge model.** A class in
`synthorg/settings/bridge_configs.py` (e.g. `ApiBridgeConfig`) with
`model_config = ConfigDict(frozen=True, allow_inf_nan=False, extra="forbid")`,
one field per setting it carries, defaults that match the
registered defaults. The model is the single source of truth for
the fallback value -- no controller carries a duplicate constant.
2. **Resolver builder.** `ConfigResolver.get_<ns>_bridge_config()`
resolves every field at once via `_resolve_bridge_fields()`.
3. **AppState slot + accessors.** `AppState.__init__` default-
constructs the bridge model so consumers always see a valid
snapshot, even before `_apply_bridge_config` has run.
`AppState.<name>_bridge_config` returns the current snapshot;
`AppState.swap_<name>_bridge_config(config)` does a wholesale
replace under a per-bridge `threading.Lock`;
`AppState.mutate_<name>_bridge_config({field: value, ...})`
applies a partial update under the same lock so two concurrent
subscribers cannot lose each other's writes.
4. **Settings subscriber.** A `SettingsSubscriber` implementation in
`synthorg/settings/subscribers/<name>_bridge_subscriber.py` whose
`_WATCHED` set lists every hot-reloadable field. On change, the
subscriber resolves the new value and calls `mutate_*` with the
single-field update; `model_copy(update=...)` re-validates against
the field's `Field(ge=..., le=...)` bounds, so an out-of-range
value raises `ValidationError` and the prior snapshot is retained.
Comment thread
coderabbitai[bot] marked this conversation as resolved.
Outdated
Module-load-time guard: every key in `_WATCHED` is asserted to
exist on the bridge model so a typo or rename surfaces at import,
not on the next operator hot-reload.

Use this pattern when the setting is hot-reloadable
(`restart_required=False`) but per-request resolver lookup would
add overhead or coupling. For restart-required knobs (e.g.
`ws_auth_timeout_seconds`) the simpler `set_*()` pattern in
`_apply_bridge_config` is sufficient.

## Bootstrap-wiring trace (ghost-wired settings gate)

A registered setting whose consuming machinery exists but is never
Expand Down
9 changes: 4 additions & 5 deletions scripts/no_magic_numbers_baseline.txt
Original file line number Diff line number Diff line change
Expand Up @@ -33,8 +33,7 @@ src/synthorg/api/auth/token_size.py:30:29
src/synthorg/api/auth/token_size.py:38:29
src/synthorg/api/auth_response_discriminator.py:29:30
src/synthorg/api/boundary.py:46:24
src/synthorg/api/controllers/activities.py:45:24
src/synthorg/api/controllers/activities.py:366:29
src/synthorg/api/controllers/activities.py:318:29
src/synthorg/api/controllers/agent_identity_versions.py:111:29
src/synthorg/api/controllers/agents.py:66:24
src/synthorg/api/controllers/agents.py:190:29
Expand All @@ -45,7 +44,7 @@ src/synthorg/api/controllers/approvals.py:75:40
src/synthorg/api/controllers/approvals.py:519:29
src/synthorg/api/controllers/artifacts.py:181:29
src/synthorg/api/controllers/audit.py:39:19
src/synthorg/api/controllers/audit.py:101:29
src/synthorg/api/controllers/audit.py:108:29
src/synthorg/api/controllers/backup.py:187:29
src/synthorg/api/controllers/budget.py:302:29
src/synthorg/api/controllers/budget_config_versions.py:46:29
Expand All @@ -55,7 +54,7 @@ src/synthorg/api/controllers/connections.py:65:20
src/synthorg/api/controllers/connections.py:66:22
src/synthorg/api/controllers/connections.py:156:29
src/synthorg/api/controllers/coordination_metrics.py:34:21
src/synthorg/api/controllers/coordination_metrics.py:93:29
src/synthorg/api/controllers/coordination_metrics.py:100:29
src/synthorg/api/controllers/custom_rules.py:205:29
src/synthorg/api/controllers/custom_rules.py:435:29
src/synthorg/api/controllers/departments.py:229:49
Expand Down Expand Up @@ -136,7 +135,7 @@ src/synthorg/api/services/idempotency_service.py:47:41
src/synthorg/api/services/idempotency_service.py:48:49
src/synthorg/api/services/org_mutations.py:43:22
src/synthorg/api/services/ssrf_violation_service.py:141:21
src/synthorg/api/state_services.py:114:26
src/synthorg/api/state_services.py:115:26
src/synthorg/backup/service_archive.py:38:23
src/synthorg/backup/service_archive.py:42:21
src/synthorg/budget/_optimizer_helpers.py:225:41
Expand Down
50 changes: 1 addition & 49 deletions src/synthorg/api/controllers/activities.py
Original file line number Diff line number Diff line change
Expand Up @@ -36,58 +36,10 @@
API_ACTIVITY_FEED_QUERIED,
API_REQUEST_ERROR,
)
from synthorg.settings.enums import SettingNamespace
from synthorg.tools.invocation_record import ToolInvocationRecord # noqa: TC001

logger = get_logger(__name__)

# Fallback cap applied when no settings resolver is wired in.
_MAX_LIFECYCLE_EVENTS = 10_000

# Module-level log-once guard: during a prolonged settings outage
# this endpoint is queried once per request and would otherwise
# flood the logs with identical fallback warnings (plus traceback).
# The flag is set when we first emit the fallback warning and is
# cleared on the next successful resolution, so a later outage is
# visible again.
_lifecycle_cap_fallback_logged: bool = False


async def _resolve_lifecycle_cap(app_state: AppState) -> int:
"""Resolve the active lifecycle-query cap, falling back to the constant.

A settings outage or malformed value must not fail the endpoint;
the fallback constant keeps the DB-side ``LIMIT`` bounded even
when the resolver is unavailable. Warnings are log-once per run
of failures (cleared on recovery) to avoid flooding logs during
a prolonged outage; traceback logging is suppressed for the same
reason.
"""
global _lifecycle_cap_fallback_logged # noqa: PLW0603
if not app_state.has_config_resolver:
return _MAX_LIFECYCLE_EVENTS
try:
value = await app_state.config_resolver.get_int(
SettingNamespace.API.value, "max_lifecycle_events_per_query"
)
except asyncio.CancelledError:
raise
except MemoryError, RecursionError:
raise
except Exception as exc:
if not _lifecycle_cap_fallback_logged:
logger.warning(
API_REQUEST_ERROR,
note="failed to resolve max_lifecycle_events_per_query; using fallback",
error_type=type(exc).__name__,
error=safe_error_description(exc),
cap=_MAX_LIFECYCLE_EVENTS,
)
_lifecycle_cap_fallback_logged = True
return _MAX_LIFECYCLE_EVENTS
_lifecycle_cap_fallback_logged = False
return value


# Degraded source names -- used in responses and tests.
_SRC_PERFORMANCE_TRACKER = "performance_tracker"
Expand Down Expand Up @@ -407,7 +359,7 @@ async def list_activities( # noqa: PLR0913
app_state: AppState = state.app_state
now = datetime.now(UTC)
since = now - timedelta(hours=last_n_hours)
lifecycle_cap = await _resolve_lifecycle_cap(app_state)
lifecycle_cap = app_state.api_bridge_config.max_lifecycle_events_per_query

lifecycle_events = await app_state.persistence.lifecycle_events.list_events(
agent_id=agent_id,
Expand Down
11 changes: 9 additions & 2 deletions src/synthorg/api/controllers/audit.py
Original file line number Diff line number Diff line change
Expand Up @@ -39,8 +39,15 @@
_MAX_AUDIT_QUERY = 10_000
"""Fallback cap applied when no settings resolver is wired in."""

# Module-level log-once guard for the settings-resolution fallback;
# see ``activities._resolve_lifecycle_cap`` for the rationale.
# Module-level log-once guard for the settings-resolution fallback:
# during a prolonged settings outage this endpoint is queried once per
# request and would otherwise flood the logs with identical warnings;
# the flag is set on first failure and cleared on the next successful
# resolution so a later outage is visible again. The activities
# controller has migrated this knob to ``app_state.api_bridge_config``
# (startup snapshot + ``ApiBridgeSettingsSubscriber`` hot-reload); this
# controller follows the same pattern when its cap is similarly
# bridged.
Comment thread
coderabbitai[bot] marked this conversation as resolved.
Outdated
_audit_cap_fallback_logged: bool = False


Expand Down
11 changes: 9 additions & 2 deletions src/synthorg/api/controllers/coordination_metrics.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,8 +34,15 @@
_MAX_METRICS_QUERY = 10_000
"""Fallback cap applied when no settings resolver is wired in."""

# Module-level log-once guard for the settings-resolution fallback;
# see ``activities._resolve_lifecycle_cap`` for the rationale.
# Module-level log-once guard for the settings-resolution fallback:
# during a prolonged settings outage this endpoint is queried once per
# request and would otherwise flood the logs with identical warnings;
# the flag is set on first failure and cleared on the next successful
# resolution so a later outage is visible again. The activities
# controller has migrated this knob to ``app_state.api_bridge_config``
# (startup snapshot + ``ApiBridgeSettingsSubscriber`` hot-reload); this
# controller follows the same pattern when its cap is similarly
# bridged.
Comment thread
coderabbitai[bot] marked this conversation as resolved.
Outdated
_metrics_cap_fallback_logged: bool = False


Expand Down
42 changes: 42 additions & 0 deletions src/synthorg/api/lifecycle_helpers.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@
API_APP_STARTUP,
API_AUDIT_RETENTION,
API_AUTH_LOCKOUT_CLEANUP,
API_BRIDGE_CONFIG_RESOLVE_FAILED,
API_SESSION_CLEANUP,
API_WS_TICKET_CLEANUP,
)
Expand All @@ -30,6 +31,7 @@
registered_default_int,
)
from synthorg.settings.subscribers import (
ApiBridgeSettingsSubscriber,
BackupSettingsSubscriber,
MemorySettingsSubscriber,
ObservabilitySettingsSubscriber,
Expand Down Expand Up @@ -589,11 +591,16 @@ def _build_settings_dispatcher(
app_state=app_state,
settings_service=settings_service,
)
api_bridge_sub = ApiBridgeSettingsSubscriber(
app_state=app_state,
settings_service=settings_service,
)
subs: list[SettingsSubscriber] = [
provider_sub,
memory_sub,
observability_sub,
per_op_rl_sub,
api_bridge_sub,
]
if backup_service is not None:
subs.append(
Expand Down Expand Up @@ -754,6 +761,40 @@ async def _apply_notification_dispatcher_config(
)


async def _apply_api_bridge_config_snapshot(app_state: AppState) -> None:
"""Snapshot ``ApiBridgeConfig`` onto ``AppState`` at startup.

Resolves the full bridge once via
:meth:`ConfigResolver.get_api_bridge_config` and atomically swaps
it onto ``app_state``. On any non-fatal resolve failure the
default ``ApiBridgeConfig()`` snapshot installed by
``AppState.__init__`` is retained and a single structured warning
is emitted -- the centralised replacement for the per-request
log-once fallback the activities controller used to carry inline.

No-op when no resolver is wired (dev/test rigs that bypass
``create_app``); the default snapshot remains in place.
"""
if not app_state.has_config_resolver:
return
try:
snapshot = await app_state.config_resolver.get_api_bridge_config()
except asyncio.CancelledError:
raise
except MemoryError, RecursionError:
raise
Comment on lines +784 to +785

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The except MemoryError, RecursionError: syntax (without parentheses) is only valid in Python 3.13+ (per PEP 758). If the project supports older Python versions, this will raise a SyntaxError. Additionally, the PR description states that a polish pass parenthesised these in newly-touched files, but this new function still uses the unparenthesised form.

    except (MemoryError, RecursionError):
        raise

except Exception as exc:
logger.warning(
API_BRIDGE_CONFIG_RESOLVE_FAILED,
bridge="api",
error_type=type(exc).__name__,
error=safe_error_description(exc),
fallback="module_defaults",
)
return
app_state.swap_api_bridge_config(snapshot)


async def _apply_bridge_config( # noqa: C901, PLR0912, PLR0915
app_state: AppState,
effective_config: RootConfig | None,
Expand All @@ -768,6 +809,7 @@ async def _apply_bridge_config( # noqa: C901, PLR0912, PLR0915
return

await _validate_approval_urgency_invariant(app_state)
await _apply_api_bridge_config_snapshot(app_state)

try:
app_state.ticket_store.set_max_pending_per_user(
Expand Down
18 changes: 18 additions & 0 deletions src/synthorg/api/state.py
Original file line number Diff line number Diff line change
Expand Up @@ -108,6 +108,7 @@
from synthorg.security.audit import AuditLog # noqa: TC001
from synthorg.security.timeout.scheduler import ApprovalTimeoutScheduler # noqa: TC001
from synthorg.security.trust.service import TrustService # noqa: TC001
from synthorg.settings.bridge_configs import ApiBridgeConfig
from synthorg.settings.resolver import ConfigResolver
from synthorg.settings.service import SettingsService # noqa: TC001
from synthorg.telemetry.collector import TelemetryCollector # noqa: TC001
Expand Down Expand Up @@ -155,6 +156,8 @@ class AppState(AppStateServicesMixin):
"_agent_registry",
"_agent_version_service",
"_analytics_service",
"_api_bridge_config",
"_api_bridge_config_lock",
"_approval_gate",
"_approval_timeout_scheduler",
"_artifact_facade_service",
Expand Down Expand Up @@ -415,6 +418,21 @@ def __init__( # noqa: PLR0913, PLR0915
# built into the notification-dispatcher sinks rather than
# rebuilding and closing them on every startup.
self._bridge_config_applied: bool = False
# Frozen ``ApiBridgeConfig`` snapshot consumed by API
# controllers (e.g. activities lifecycle cap). Default-
# constructed so consumers always see a valid instance, even
# before ``_apply_bridge_config`` has run or when the resolver
# is unavailable; ``_apply_bridge_config`` swaps in the
# operator-tuned snapshot, and
# ``ApiBridgeSettingsSubscriber`` hot-swaps it on operator
# edits (no restart required). The dedicated lock guards the
# read-modify-write path on ``mutate_api_bridge_config`` so two
# concurrent subscribers each computing
# ``model_copy(update=...)`` from the same prior snapshot
# cannot lose each other's updates by both calling
# ``swap_api_bridge_config`` based on a stale read.
self._api_bridge_config: ApiBridgeConfig = ApiBridgeConfig()
self._api_bridge_config_lock: threading.Lock = threading.Lock()
self._provider_management: ProviderManagementService | None = None
self._org_mutation_service: OrgMutationService | None = None
# Shutdown flag observable by long-lived subsystems.
Expand Down
Loading
Loading