docs: audit cleanup D -- public-facing & docs sync (#1709)#1715
Conversation
Bundle D from the 2026-05-01 codebase audit. Public-facing pages and design docs drifted from the actual codebase: - README.md and docs/roadmap/index.md: replace "25,000+ unit tests" with "27,000+ tests". Live count is 27,566 collected (uv run python -m pytest tests/ --collect-only -q | tail -1 reports "27566 tests collected"). - docs/architecture/tech-stack.md: "57 Latin-script locales across 12 world regions" becomes 11 world regions (data/locales.yaml has 11 region groupings). - docs/reference/comparison.md: drop the .gitignore line and commit the generator output. The script and YAML are correct; the local file was generated 2026-04-02 against a YAML state where SynthOrg features were classified as planned across the board, then YAML was updated 2026-04-26 in #1595 without rerunning the generator. CI regenerates on every build, but tracking the artefact surfaces drift in PR diffs. - docs/architecture/decisions.md: rewrite D16 to describe the actual layered SandboxBackend strategy (subprocess for low-risk file_system and git, Docker for high-risk code_execution / terminal / database / web). Code (src/synthorg/tools/sandbox/factory.py) and design (docs/design/tools.md) already align; only the decision row drifted. - docs/design/tools.md: drop the "no unsafe subprocess fallback for code execution" phrasing; low-risk categories continue via subprocess even when Docker is unavailable. - docs/design/communication.md: drop *(proposed)* on the REJECTED A2A task-state row. TaskStatus.REJECTED is implemented at src/synthorg/core/enums.py:256 with the CREATED..REJECTED transition documented in the enum docstring. - docs/design/self-improvement.md, docs/reference/pluggable-subsystems.md and docs/reference/claude-reference.md: drop references to deleted modules. Clock and SystemClock now live at src/synthorg/core/clock.py (not src/synthorg/meta/rollout/clock.py). Auth lockout / refresh / session storage now lives under persistence/{sqlite,postgres}/ (not src/synthorg/api/auth/{lockout_store,refresh_store,session_store}.py). - docs/DESIGN_SPEC.md: add the Internationalization design page row; internationalization.md exists but was unlisted. - CLAUDE.md: drop the dangling docs/design/operations.md reference from the vendor-name carve-out list. src/synthorg/providers/presets.py is already item (3) of the same list and plays that role; renumber. Migration-framing rot (agent 155): strip 11 issue-number / migration narratives back to present-tense descriptions across src/ and tests/, plus 2 adjacent drive-bys in files already touched (coordination_protocol.py:48-49, efficiency.py:5). tests/unit/architecture/test_layering.py keeps its #1610 reference; the test exists to enforce the deletion of synthorg.api.errors, so the historical anchor is load-bearing. Closes #1709
Dependency Review✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.Scanned FilesNone |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository UI Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (1)
📜 Recent review details⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
🧰 Additional context used📓 Path-based instructions (4)**/*.py📄 CodeRabbit inference engine (CLAUDE.md)
Files:
src/synthorg/**/*.py📄 CodeRabbit inference engine (CLAUDE.md)
Files:
**/*.{py,ts,tsx}📄 CodeRabbit inference engine (CLAUDE.md)
Files:
src/**/*.py⚙️ CodeRabbit configuration file
Files:
🧠 Learnings (1)📓 Common learnings🔇 Additional comments (2)
WalkthroughThis PR applies multiple documentation updates from a 2026-05-01 audit: increments reported test counts (25,000+ → 27,000+), adds an Internationalization design row, revises D16 to specify a layered SandboxBackend (subprocess for low-risk, Docker for high-risk), finalizes the Suggested labelsautorelease: tagged 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Review rate limit: 4/5 reviews remaining, refill in 12 minutes. Comment |
Merging this PR will not alter performance
Comparing Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request performs a broad cleanup of the codebase by removing internal issue references from docstrings and comments across various modules. Key documentation updates include the introduction of a comprehensive framework comparison page, an internationalization design specification, and a revised architectural decision (D16) that adopts a layered sandboxing approach—using subprocesses for low-risk tasks and Docker for high-risk execution. Additionally, project metadata and test counts have been updated to reflect current development status. Feedback is provided to improve the specificity of docstrings in the coordination protocol by explicitly naming the ledger middleware as the source of certain data structures.
| task_ledger: TaskLedger populated by coordination middleware. | ||
| progress_ledger: ProgressLedger populated by coordination middleware. |
There was a problem hiding this comment.
The generic term "coordination middleware" is used to replace a specific issue reference (#1257). If these ledgers are populated by a specific middleware implementation (e.g., a LedgerMiddleware), it would be more helpful to name that specific component or describe the phase in which they are populated, as "coordination middleware" refers to the entire category defined in this protocol file.
| task_ledger: TaskLedger populated by coordination middleware. | |
| progress_ledger: ProgressLedger populated by coordination middleware. | |
| task_ledger: TaskLedger populated by the ledger middleware. | |
| progress_ledger: ProgressLedger populated by the ledger middleware. |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
docs/reference/claude-reference.md (1)
38-69: 🛠️ Refactor suggestion | 🟠 Major | 🏗️ Heavy liftReplace the
text-fenced package tree with a supported docs format.This updated section is still maintained inside a ```text block; migrate this structure to Mermaid (simple hierarchy) or a Markdown table/list format to comply with docs standards.
As per coding guidelines,
docs/**/*.mdshould use Mermaid for simple hierarchies and should not use```textblocks for structural diagrams.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/reference/claude-reference.md` around lines 38 - 69, Replace the existing ```text fenced package tree (the block that begins with "src/synthorg/") with a Mermaid hierarchy block (or a Markdown nested list if Mermaid is unsuitable); remove the ```text block entirely and author a compact mermaid diagram that captures the top-level directories (src/synthorg/, web/src/, cli/, site/, data/) and the key subpackages (e.g., api/, core/, engine/, persistence/, memory/, providers/, tools/) so the structure renders as a simple hierarchy per docs standards and avoids using ```text for structural diagrams.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docs/DESIGN_SPEC.md`:
- Line 46: The DESIGN_SPEC row claiming "British English UI default" conflicts
with the runtime constant APP_LOCALE_FALLBACK in web/src/utils/locale.ts
(currently set to 'en'); decide whether the source-of-truth should be the spec
or the code and make a single consistent change: either update the docs row text
to "Neutral English (en) UI default" to match APP_LOCALE_FALLBACK, or change
APP_LOCALE_FALLBACK to 'en-GB' and adjust any Intl/formatting tests to use
en-GB; ensure you update the unique identifiers in the same commit (the spec
table entry and the APP_LOCALE_FALLBACK constant) so they stay aligned.
---
Outside diff comments:
In `@docs/reference/claude-reference.md`:
- Around line 38-69: Replace the existing ```text fenced package tree (the block
that begins with "src/synthorg/") with a Mermaid hierarchy block (or a Markdown
nested list if Mermaid is unsuitable); remove the ```text block entirely and
author a compact mermaid diagram that captures the top-level directories
(src/synthorg/, web/src/, cli/, site/, data/) and the key subpackages (e.g.,
api/, core/, engine/, persistence/, memory/, providers/, tools/) so the
structure renders as a simple hierarchy per docs standards and avoids using
```text for structural diagrams.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 947e8dd5-a591-420b-af50-1e2b5616358d
📒 Files selected for processing (24)
.gitignoreCLAUDE.mdREADME.mddocs/DESIGN_SPEC.mddocs/architecture/decisions.mddocs/architecture/tech-stack.mddocs/design/communication.mddocs/design/self-improvement.mddocs/design/tools.mddocs/reference/claude-reference.mddocs/reference/comparison.mddocs/reference/pluggable-subsystems.mddocs/roadmap/index.mdsrc/synthorg/core/evidence.pysrc/synthorg/engine/middleware/coordination_protocol.pysrc/synthorg/engine/trajectory/efficiency_ratios.pysrc/synthorg/hr/evaluation/extractors/efficiency.pysrc/synthorg/hr/evaluation/metric_extractor_protocol.pysrc/synthorg/providers/management/preset_override_service.pytests/unit/api/test_exception_handlers.pytests/unit/budget/test_cost_record.pytests/unit/engine/artifacts/test_service.pytests/unit/observability/test_events.pytests/unit/settings/test_definitions_config_bridge.py
💤 Files with no reviewable changes (2)
- .gitignore
- docs/design/self-improvement.md
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Build Backend
- GitHub Check: Lighthouse Site
- GitHub Check: Test (Python 3.14)
- GitHub Check: Build Web Assets (melange)
- GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (6)
**/*.md
📄 CodeRabbit inference engine (CLAUDE.md)
When a spec topic is referenced (e.g. 'the Agents page' or 'the Engine page's Crash Recovery section'), read the relevant
docs/design/page before implementation; deviations require explicit user approval and spec updates
Files:
docs/DESIGN_SPEC.mddocs/architecture/tech-stack.mdREADME.mddocs/roadmap/index.mdCLAUDE.mddocs/reference/claude-reference.mddocs/design/tools.mddocs/design/communication.mddocs/reference/comparison.mddocs/reference/pluggable-subsystems.mddocs/architecture/decisions.md
docs/**/*.md
📄 CodeRabbit inference engine (CLAUDE.md)
Use D2 (
\``d2) for architecture diagrams, nested container layouts, complex entity relationships; use Mermaid (```mermaid) for flowcharts, sequence diagrams, simple hierarchies, pipelines; use Markdown tables for grid/matrix data; never use```text` blocks with ASCII/Unicode box-drawingD2 diagrams use theme 200 (Dark Mauve), dark-only render, configured globally in
mkdocs.yml; rendered at build time viamkdocs-d2-plugin(dagre layout); requires D2 CLI v0.7.1 on PATH locally and in CI
Files:
docs/DESIGN_SPEC.mddocs/architecture/tech-stack.mddocs/roadmap/index.mddocs/reference/claude-reference.mddocs/design/tools.mddocs/design/communication.mddocs/reference/comparison.mddocs/reference/pluggable-subsystems.mddocs/architecture/decisions.md
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Every business-logic module must import logger via
from synthorg.observability import get_loggerand assignlogger = get_logger(__name__); never use bareimport loggingorlogging.getLogger()orprint()All Pydantic models must use
ConfigDict(frozen=True, allow_inf_nan=False)unless documented otherwise; mutations go throughmodel_copy(update=...), never direct attribute assignmentNo
from __future__ import annotations— Python 3.14 has native PEP 649 lazy annotationsUse PEP 758 except syntax:
except A, B:without parens when not binding;except (A, B) as exc:with parens when binding (ruff enforces on 3.14)All public functions and classes require type hints; mypy strict mode is enforced
Docstrings must use Google style and are required on public classes and functions (ruff D rules enforced)
Event names must always import constants from
synthorg.observability.events.<domain>; never use string literalsNever use
logger.exception(EVENT, error=str(exc)),logger.warning(EVENT, error=str(exc)), orlogger.error(EVENT, error=str(exc)); useerror_type=type(exc).__name__anderror=safe_error_description(exc)fromsynthorg.observability.safe_error_descriptionWrap attacker-controllable strings at LLM call sites via
wrap_untrusted()fromsynthorg.engine.prompt_safetyand appenduntrusted_content_directive(tags)to the enclosing system prompt (SEC-1)Never call
lxml.html.fromstringon attacker input; useHTMLParseGuardfromsynthorg.tools.html_parse_guard(SEC-1)Classes that read time or sleep must accept
clock: Clock | None = Nonedefaulting toSystemClock()fromsynthorg.core.clock; tests injectFakeClockAsync
start()/stop()services must own a dedicatedself._lifecycle_lock; timed-out stops mark the service unrestartablePrefer
asyncio.TaskGroupfor fan-out/fan-in concurrency; wrap independent task bodies inasync defhelpers that catchException(re-raise onlyMemoryError/RecursionError)...
Files:
tests/unit/engine/artifacts/test_service.pysrc/synthorg/hr/evaluation/extractors/efficiency.pytests/unit/budget/test_cost_record.pysrc/synthorg/hr/evaluation/metric_extractor_protocol.pysrc/synthorg/providers/management/preset_override_service.pysrc/synthorg/engine/trajectory/efficiency_ratios.pytests/unit/settings/test_definitions_config_bridge.pysrc/synthorg/core/evidence.pytests/unit/api/test_exception_handlers.pytests/unit/observability/test_events.pysrc/synthorg/engine/middleware/coordination_protocol.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Every
Mock()/AsyncMock()/MagicMock()must declarespec=ConcreteClass(Protocol or class); bare calls are blocked byscripts/check_mock_spec.pypre-commit gate; baseline updates requireuv run python scripts/check_mock_spec.py --updateTime-driven tests: import
FakeClockfromtests._shared.fake_clock(NOT rollout-subsystem paths) and inject into the class under test;FakeClock.sleepadvances virtual time + yields viaasyncio.sleep(0); useawait clock.advance_async(seconds)for cooperative task drivingFakeClock-first: patch
time.monotonic()/asyncio.sleep()globals only for legacy code paths without aClockseam; when the class under test acceptsclock=, always injectFakeClockUse
@pytest.mark.unit,@pytest.mark.integration,@pytest.mark.e2e,@pytest.mark.slowto categorize tests; run with-n 8parallelism (pytest-xdist); global 30-second timeout per testNEVER delete, skip, or xfail tests to fix slowness; identify slow tests with
--durations=50 --durations-min=0.5, compare againsttests/baselines/unit_timing.json, and fix source-code regressions (suite time >baseline * 1.3 is a source code bug, not a test bug)Minimum 80% code coverage enforced in CI; benchmarks excluded via
--ignore=tests/benchmarks/Property-based tests use Hypothesis with
derandomize=True(10 deterministic examples in CI); when Hypothesis finds a failure, it is a real bug — fix the underlying code and add an@example(...)decorator for permanent coverageTests must use
test-provider,test-small-001, etc.; never use real vendor names (Anthropic, OpenAI, Claude, GPT)
Files:
tests/unit/engine/artifacts/test_service.pytests/unit/budget/test_cost_record.pytests/unit/settings/test_definitions_config_bridge.pytests/unit/api/test_exception_handlers.pytests/unit/observability/test_events.py
⚙️ CodeRabbit configuration file
Test files do not require Google-style docstrings on classes or functions -- ruff D rules are only enforced on src/. A bare
@settings() decorator with no arguments on Hypothesis property tests is a no-op and should not be suggested -- the HYPOTHESIS_PROFILE env var controls example counts via registered profiles, which@given() honors automatically.
Files:
tests/unit/engine/artifacts/test_service.pytests/unit/budget/test_cost_record.pytests/unit/settings/test_definitions_config_bridge.pytests/unit/api/test_exception_handlers.pytests/unit/observability/test_events.py
src/synthorg/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Comments must explain WHY only (hidden constraints, subtle invariants, workarounds for upstream bugs with stable URLs); forbidden: reviewer citations, issue back-references, internal taxonomy shorthand, migration framing, round/iteration narrative, self-evident restatements
Line length: 88 characters (ruff enforced); functions: <50 lines; files: <800 lines
Never import
aiosqlite,sqlite3,psycopg,psycopg_pool, or emit raw SQL DDL/DML keywords in string literals outsidesrc/synthorg/persistence/Repository CRUD vocabulary:
save(entity) -> None(insert-or-update, idempotent),get(id) -> Entity | None(None on miss),delete(id) -> bool(True on removal, False if absent),list_items(...) -> tuple[Entity, ...](paginated/filtered),query(...) -> tuple[Entity, ...](queries always return tuples, never lists)Domain error class naming: use
<Domain><Condition>Error, inherit fromDomainErroror a domain-scoped intermediate; bareException/RuntimeErrorat domain boundaries is forbiddenHandle errors explicitly, never swallow; domain error families register a base-class entry in
EXCEPTION_HANDLERS(src/synthorg/api/exception_handlers.py) so subtypes get correct status codesValidate at system boundaries (user input, external APIs, config files)
For every mutable setting, use DB > env (
SYNTHORG_<NAMESPACE>_<KEY>) > YAML > code default, resolved throughSettingsService/ConfigResolver; two exceptions: init-time only (env-only, no registry) and read-only post-init (marked withread_only_post_init=True, raisesSettingReadOnlyErroron set)State transitions on status enums log at INFO after persistence write succeeds, using domain-scoped
*_STATUS_TRANSITIONEDconstant carryingfrom_status/to_status/ domain idEvery settings read emits one INFO
settings.value.resolvedevent on first cold read per process, carryingsource+yaml_path; subsequent reads stay at DEBUGNEVER use real vendor names (Anthropic, OpenAI, Claud...
Files:
src/synthorg/hr/evaluation/extractors/efficiency.pysrc/synthorg/hr/evaluation/metric_extractor_protocol.pysrc/synthorg/providers/management/preset_override_service.pysrc/synthorg/engine/trajectory/efficiency_ratios.pysrc/synthorg/core/evidence.pysrc/synthorg/engine/middleware/coordination_protocol.py
src/**/*.py
⚙️ CodeRabbit configuration file
This project uses Python 3.14+ with PEP 758 except syntax: "except A, B:" (comma-separated, no parentheses) is correct and mandatory -- do NOT flag it as a typo or suggest parenthesized form. The "except builtins.MemoryError, RecursionError: raise" pattern is intentional project convention for system-error propagation. When evaluating the 50-line function limit, count only the function body excluding the signature lines, decorators, and docstring. Functions 1-5 lines over due to docstrings or multi-line signatures should not be flagged. Do not suggest extracting single-use helper functions called exactly once -- this reduces readability without improving maintainability.
Files:
src/synthorg/hr/evaluation/extractors/efficiency.pysrc/synthorg/hr/evaluation/metric_extractor_protocol.pysrc/synthorg/providers/management/preset_override_service.pysrc/synthorg/engine/trajectory/efficiency_ratios.pysrc/synthorg/core/evidence.pysrc/synthorg/engine/middleware/coordination_protocol.py
🧠 Learnings (1)
📓 Common learnings
Learnt from: CR
Repo: Aureliolo/synthorg
Timestamp: 2026-05-02T09:36:49.620Z
Learning: ALWAYS read the relevant `docs/design/` page before implementing any feature or planning any issue; the design spec is the starting point for architecture, data models, and behavior
Learnt from: CR
Repo: Aureliolo/synthorg
Timestamp: 2026-05-02T09:36:49.620Z
Learning: Every implementation plan must be presented to the user for accept/deny before coding starts; at every phase, actively look for design improvements and surface suggestions (not silent changes) for user approval
Learnt from: CR
Repo: Aureliolo/synthorg
Timestamp: 2026-05-02T09:36:49.620Z
Learning: If implementation deviates from the spec, alert the user and explain why; the user decides whether to proceed or update the spec; do NOT silently diverge
Learnt from: CR
Repo: Aureliolo/synthorg
Timestamp: 2026-05-02T09:36:49.620Z
Learning: When approved deviations occur, update the relevant `docs/design/` page to reflect the new reality
Learnt from: CR
Repo: Aureliolo/synthorg
Timestamp: 2026-05-02T09:36:49.620Z
Learning: Prioritize issues by dependency order, not priority labels; unblocked dependencies come first
Learnt from: CR
Repo: Aureliolo/synthorg
Timestamp: 2026-05-02T09:36:49.620Z
Learning: After finishing an issue implementation, always create a feature branch (`<type>/<slug>`), commit, and push; do NOT create a PR automatically; do NOT leave work uncommitted on main
Learnt from: CR
Repo: Aureliolo/synthorg
Timestamp: 2026-05-02T09:36:49.620Z
Learning: NEVER create a PR directly via `gh pr create` (blocked by hookify); ALWAYS use `/pre-pr-review` to create PRs; for trivial/docs changes use `/pre-pr-review quick` to skip agents; after PR exists, use `/aurelio-review-pr` for external feedback
Learnt from: CR
Repo: Aureliolo/synthorg
Timestamp: 2026-05-02T09:36:49.620Z
Learning: When review agents find valid issues (including pre-existing issues, suggestions, adjacent findings), fix them all; never skip or defer; no 'out of scope' exemptions
Learnt from: CR
Repo: Aureliolo/synthorg
Timestamp: 2026-05-02T09:36:49.620Z
Learning: Commits use `<type>: <description>` format; types: feat, fix, refactor, docs, test, chore, perf, ci; enforced by commitizen (commit-msg hook); signed commits required on main; all commits must be GPG/SSH signed (exception: GitHub App-signed commits from `synthorg-repo-bot`)
Learnt from: CR
Repo: Aureliolo/synthorg
Timestamp: 2026-05-02T09:36:49.620Z
Learning: Branches use `<type>/<slug>` format from main
Learnt from: CR
Repo: Aureliolo/synthorg
Timestamp: 2026-05-02T09:36:49.620Z
Learning: Merge strategy: squash merge; PR body becomes the squash commit message on main; preserve existing `Closes `#NNN`` trailers; never remove unless explicitly asked
Learnt from: CR
Repo: Aureliolo/synthorg
Timestamp: 2026-05-02T09:36:49.620Z
Learning: NEVER use `cd` in Bash commands; working directory is already set to project root; use absolute paths or run commands directly; exception: `bash -c "cd <dir> && <cmd>"` runs in child process safely
Learnt from: CR
Repo: Aureliolo/synthorg
Timestamp: 2026-05-02T09:36:49.620Z
Learning: NEVER use Bash to write/modify files; use Write or Edit tools; forbidden: `cat >`, `cat << EOF`, `echo >`, `echo >>`, `sed -i`, `python -c "open(...).write(...)"`, `tee` (read-only inspection like piping to stdout is fine)
🔇 Additional comments (20)
docs/design/communication.md (1)
199-199: Docs-state mapping update is correct and consistent.The
REJECTED↔rejectedbidirectional mapping now matches the implemented status model and resolves the prior “proposed” inconsistency.src/synthorg/providers/management/preset_override_service.py (1)
18-18: Docstring wording update is aligned with current design narrative.This change correctly removes audit/migration framing and keeps the note forward-looking without issue-tracker coupling.
Based on learnings: "If implementation deviates from the spec, alert the user and explain why; the user decides whether to proceed or update the spec; do NOT silently diverge"
src/synthorg/core/evidence.py (1)
51-52: Docstring update is consistent with the current schema and audit-facing wording.Looks good—this is a clean present-tense docs sync with no behavior risk.
src/synthorg/engine/middleware/coordination_protocol.py (1)
48-49: Docs wording update is correct and consistent.These description updates improve clarity and remove issue-specific phrasing without affecting behavior.
Also applies to: 80-85
CLAUDE.md (1)
213-213: Vendor-name exception cleanup is consistent and scoped correctly.This update cleanly removes the stale operations-page exception while preserving the intended runtime/data carve-outs (
presets.pyand logo assets).README.md (1)
22-22: Test-metric update looks correct and consistent.This wording aligns with the audit sync objective and keeps the claim safely conservative.
docs/roadmap/index.md (1)
5-5: Roadmap status metric is correctly synchronized.The updated count is consistent with the audit cleanup goal and the corresponding README change.
src/synthorg/engine/trajectory/efficiency_ratios.py (1)
5-5: Docstring wording is accurate and consistent with the model fields.src/synthorg/hr/evaluation/metric_extractor_protocol.py (1)
164-166: Async rationale clarification is clear and semantically correct for the protocol contract.src/synthorg/hr/evaluation/extractors/efficiency.py (1)
5-5: Kill-switch docstring updates are consistent with the extractor’s current behavior and terminology.Also applies to: 37-37
tests/unit/api/test_exception_handlers.py (1)
824-824: Docstring clarification looks good.This wording is clearer and matches the warning-path-focused assertions in the class.
tests/unit/engine/artifacts/test_service.py (1)
3-5: Module docstring sync is accurate.The updated wording correctly reflects the event namespace and mutation-emission behavior validated by this file.
tests/unit/budget/test_cost_record.py (1)
317-317: Docstring update is clear and appropriate.The revised phrasing better matches the class’ validation-focused test coverage.
tests/unit/observability/test_events.py (1)
315-331: Comment refresh is consistent and helpful.These clarifications improve maintainability without altering test behavior or coverage intent.
docs/reference/comparison.md (1)
8-268: Generated comparison page looks consistent and release-ready.The generated marker, section structure, legend clarity, and table-based presentation are all in good shape for docs sync.
tests/unit/settings/test_definitions_config_bridge.py (1)
3-6: LGTM – clearer present-tense description.The updated docstring accurately describes the
_EXPECTEDtuple structure and removes migration framing, aligning with the PR's audit cleanup objective.docs/architecture/tech-stack.md (1)
90-90: Name-generation stack note is consistent with implementation.Line 90 accurately reflects the current locale cardinality/region count and the seeded-vs-cached Faker behavior in
src/synthorg/templates/locales.pyandsrc/synthorg/templates/presets.py.docs/reference/pluggable-subsystems.md (1)
54-54: Documentation reference is accurate.The
SystemClockfromsynthorg.core.clockis correctly identified as the default clock implementation. Verification confirms the class exists, is properly importable from the module, and is consistently used throughout the codebase as the default for rollout strategy initialization.docs/architecture/decisions.md (1)
64-64: LGTM! D16 accurately describes the layered sandboxing strategy.The updated decision correctly documents the
SandboxBackendprotocol with subprocess for low-risk categories (file_system, git) and Docker for high-risk (code_execution, terminal, database, web). All isolation controls listed (env filtering, restricted PATH, workspace-scoped cwd, timeout + process-group kill, library-injection-var blocking) are verified in the code snippets fromSubprocessSandboxConfig,_build_filtered_env, and_filter_path. The rationale clearly explains why Docker-only was rejected (breaks local-first quickstart for trivial file operations) and why the layered approach balances security with usability.docs/design/tools.md (1)
123-125: LGTM! Docker failure behavior accurately documented.The updated text correctly describes the layered sandboxing behavior: when Docker is unavailable, only tool categories whose
configured backendis Docker will fail with an error, while categories configured for subprocess (file_system, git) continue to work. This aligns perfectly with decision D16 indocs/architecture/decisions.mdand theSandboxingConfigdesign shown in context snippets. The wording is precise—it correctly emphasizes "whose configured backend is Docker" rather than implying an intrinsic category property.
| | [Client Simulation](design/client-simulation.md) | Client Types, Intake, Review Pipeline, Simulation | Synthetic client framework for workload generation and evaluation | | ||
| | [Strategy & Trendslop Mitigation](design/strategy.md) | Lenses, Principles, Confidence, Impact | Anti-trendslop mitigation for strategic agents | | ||
| | [Self-Improvement](design/self-improvement.md) | Meta-Loop, Signals, Rules, Proposals, Rollout | Self-improving company: signal aggregation, rule engine, improvement proposals, staged rollout | | ||
| | [Internationalization](design/internationalization.md) | Locale Resolution, UI Text, Translation Scope | British English UI default, locale-aware Intl-based display formatting, no planned translation framework | |
There was a problem hiding this comment.
UI-locale default text conflicts with current implementation.
Line 46 says “British English UI default”, but web/src/utils/locale.ts:20-30 sets APP_LOCALE_FALLBACK = 'en' (neutral English). Please align the row with current behavior (or update runtime code to en-GB if that is the intended spec).
Proposed docs-only fix
-| [Internationalization](design/internationalization.md) | Locale Resolution, UI Text, Translation Scope | British English UI default, locale-aware Intl-based display formatting, no planned translation framework |
+| [Internationalization](design/internationalization.md) | Locale Resolution, UI Text, Translation Scope | Neutral English (`en`) fallback, locale-aware Intl-based display formatting, no planned translation framework |Based on learnings: If implementation deviates from the spec, alert the user and explain why; the user decides whether to proceed or update the spec; do NOT silently diverge.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| | [Internationalization](design/internationalization.md) | Locale Resolution, UI Text, Translation Scope | British English UI default, locale-aware Intl-based display formatting, no planned translation framework | | |
| | [Internationalization](design/internationalization.md) | Locale Resolution, UI Text, Translation Scope | Neutral English (`en`) fallback, locale-aware Intl-based display formatting, no planned translation framework | |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@docs/DESIGN_SPEC.md` at line 46, The DESIGN_SPEC row claiming "British
English UI default" conflicts with the runtime constant APP_LOCALE_FALLBACK in
web/src/utils/locale.ts (currently set to 'en'); decide whether the
source-of-truth should be the spec or the code and make a single consistent
change: either update the docs row text to "Neutral English (en) UI default" to
match APP_LOCALE_FALLBACK, or change APP_LOCALE_FALLBACK to 'en-GB' and adjust
any Intl/formatting tests to use en-GB; ensure you update the unique identifiers
in the same commit (the spec table entry and the APP_LOCALE_FALLBACK constant)
so they stay aligned.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #1715 +/- ##
==========================================
- Coverage 84.61% 84.60% -0.01%
==========================================
Files 1781 1781
Lines 101888 101888
Branches 8968 8968
==========================================
- Hits 86211 86206 -5
- Misses 13483 13486 +3
- Partials 2194 2196 +2 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Coordination protocol docstring uses specific middleware class names:
- src/synthorg/engine/middleware/coordination_protocol.py:48-49,80-84:
"TaskLedger populated by coordination middleware" becomes
"TaskLedger populated by TaskLedgerMiddleware"; same shape for
ProgressLedger -> ProgressLedgerMiddleware. The specific classes are
registered at engine/middleware/_defaults.py:80,82 and live in
engine/middleware/coordination_constraints.py:57,133. Naming the
concrete middleware is more informative than the generic category.
Skipped findings (logged for audit):
- coderabbitai/inline at docs/DESIGN_SPEC.md:46 ("British English UI
default" vs APP_LOCALE_FALLBACK = 'en'). The DESIGN_SPEC row already
separates UI text spelling (British English; colour, behaviour, ...)
from runtime Intl formatting (locale-resolved with 'en' fallback).
docs/design/internationalization.md:7 explicitly distinguishes the
two; the runtime constant is for number/date formatting, not UI text.
- coderabbitai/outside-diff at docs/reference/claude-reference.md:38-69
(replace text-fenced package tree with Mermaid). The CLAUDE.md rule
is "Never use text blocks with ASCII/Unicode box-drawing characters
for diagrams"; both conditions are conjunctive. The block contains
zero box-drawing characters; it is a structured description list
with multi-line per-directory descriptions, which Mermaid would not
render legibly.
<!-- HIGHLIGHTS_START --> ## Highlights > _AI-generated summary (model: `openai/gpt-4.1-mini` via GitHub Models). Commit-based changelog below._ ### What you'll notice - Frontend and UX polishing improves user interface responsiveness and visual consistency. - API hygiene and validation enhancements provide smoother and more reliable interactions. ### What's new - Introduced typed-boundary helpers enabling better type safety and parse_typed workflows. - Added codebase-audit skill prompt tuning for improved project auditing. ### Under the hood - Eliminated flaky tests caused by module-level state for more stable test outcomes. - Unified image tag management under CLI and Renovate for consistent dependency updates. - Added cross-PR file-overlap analysis to the review dependency pull request skill. - Updated multiple dependencies including Python, Web, CLI, and container libraries. - Improved CI tooling and lock file maintenance for better build reliability. <!-- HIGHLIGHTS_END --> :robot: I have created a release *beep* *boop* --- ## [0.7.8](v0.7.7...v0.7.8) (2026-05-03) ### Features * **api:** typed-boundary helper + codebase-audit skill prompt tuning ([#1712](#1712)) ([40ee65b](40ee65b)) * **boundary:** RFC [#1711](#1711) Phases 2 + 3 — typed boundaries via parse_typed ([#1720](#1720)) ([7b9f409](7b9f409)) ### Bug Fixes * **api:** audit cleanup B -- API hygiene & validation ([#1719](#1719)) ([3d790d9](3d790d9)) * audit cleanup C - persistence, concurrency & data integrity ([#1708](#1708)) ([#1717](#1717)) ([bcce097](bcce097)) * **test:** exterminate xdist-flaky tests with module-level state ([#1713](#1713)) ([#1721](#1721)) ([8d258dd](8d258dd)) * **web:** audit cleanup E -- frontend & UX polish ([#1710](#1710)) ([#1718](#1718)) ([3a3591a](3a3591a)) ### Refactoring * **cli:** single source of truth for DHI image tags + Renovate manager ([#1723](#1723)) ([57980a2](57980a2)) ### Documentation * audit cleanup D -- public-facing & docs sync ([#1709](#1709)) ([#1715](#1715)) ([ade03b7](ade03b7)) ### Tests * **engine:** make TestDrainTimeout deterministic + preserve subclass type in [@Ontology](https://github.com/ontology)_entity ([#1729](#1729)) ([b00fb05](b00fb05)) ### CI/CD * Update CI tool dependencies ([#1703](#1703)) ([355a9ff](355a9ff)) ### Maintenance * add cross-PR file-overlap analysis to review-dep-pr skill ([#1722](#1722)) ([3861d8a](3861d8a)) * **ci:** unify apko-version under workflow env so Renovate manages it everywhere ([#1724](#1724)) ([9c0a7fd](9c0a7fd)) * consolidate DHI image-pin custom regex managers ([#1726](#1726)) ([b8b0cba](b8b0cba)) * **deps:** update dependency chainguard-dev/melange to v0.50.4 ([#1701](#1701)) ([8cbf83a](8cbf83a)) * Lock file maintenance ([#1705](#1705)) ([414cfea](414cfea)) * Lock file maintenance ([#1727](#1727)) ([5cb1212](5cb1212)) * Update CLI dependencies ([#1702](#1702)) ([9fb57b9](9fb57b9)) * Update Container dependencies ([#1698](#1698)) ([6d24fd6](6d24fd6)) * Update dependency @eslint-react/eslint-plugin to v5 ([#1704](#1704)) ([1cb1294](1cb1294)) * Update Python dependencies ([#1699](#1699)) ([8e7af3a](8e7af3a)) * Update Python dependencies to v4.15.0 ([#1725](#1725)) ([69164c8](69164c8)) * Update Web dependencies ([#1700](#1700)) ([715300d](715300d)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). --------- Co-authored-by: synthorg-repo-bot[bot] <279117679+synthorg-repo-bot[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Bundle D from the 2026-05-01 codebase audit. Public-facing pages and design docs drifted from the actual codebase; this PR resyncs them.
Summary
README.md,docs/roadmap/index.md): "25,000+ unit tests" -> "27,000+ tests". Live count is 27,566 collected (uv run python -m pytest tests/ --collect-only -q | tail -1).docs/architecture/tech-stack.md): "57 Latin-script locales across 12 world regions" -> 11 (data/locales.yaml has 11 region groupings).docs/reference/comparison.md,.gitignore): drop the gitignore entry and commit the regenerated artefact. The script and YAML are correct; the local file was generated 2026-04-02 against a YAML state where SynthOrg features were classified asplannedacross the board, then YAML was updated 2026-04-26 in audit: fix public-facing documentation drift (test count, event modules, comparison page) #1595 without rerunning the generator. CI regenerates on every build, but tracking the artefact surfaces drift in PR diffs.docs/architecture/decisions.md): rewrite to describe the actual layeredSandboxBackendstrategy: subprocess for low-risk (file_system,git); Docker for high-risk (code_execution,terminal,database,web). Code (src/synthorg/tools/sandbox/factory.py) and design (docs/design/tools.md) already align; only the decision row drifted.docs/design/tools.md): drop "no unsafe subprocess fallback for code execution"; low-risk categories continue via subprocess even when Docker is unavailable.docs/design/communication.md): drop*(proposed)*on the REJECTED A2A task-state row.TaskStatus.REJECTEDis implemented atsrc/synthorg/core/enums.py:256with theCREATED -> REJECTEDtransition documented in the enum docstring.docs/design/self-improvement.md,docs/reference/pluggable-subsystems.md,docs/reference/claude-reference.md): drop references tosrc/synthorg/meta/rollout/clock.py(Clock + SystemClock now live atsrc/synthorg/core/clock.py) and tosrc/synthorg/api/auth/{lockout_store,refresh_store,session_store}.py(storage now lives underpersistence/{sqlite,postgres}/).docs/DESIGN_SPEC.md): add the Internationalization design page row;internationalization.mdexists but was unlisted.docs/design/operations.mdreference;src/synthorg/providers/presets.pyalready plays that role and is item (3) of the same list.Migration-framing rot (agent 155)
Strip 11 issue-number / migration narratives back to present-tense descriptions:
src/synthorg/core/evidence.py:51src/synthorg/engine/middleware/coordination_protocol.py:80,84(+ adjacent lines 48-49 drive-by)src/synthorg/engine/trajectory/efficiency_ratios.py:6src/synthorg/hr/evaluation/extractors/efficiency.py:38(+ adjacent line 5 drive-by)src/synthorg/hr/evaluation/metric_extractor_protocol.py:165-166src/synthorg/providers/management/preset_override_service.py:17-18tests/unit/api/test_exception_handlers.py:824tests/unit/budget/test_cost_record.py:317tests/unit/engine/artifacts/test_service.py:4-5tests/unit/observability/test_events.py:315-321,331-333tests/unit/settings/test_definitions_config_bridge.py:5tests/unit/architecture/test_layering.pykeeps its#1610reference; the test exists to enforce the deletion ofsynthorg.api.errors, so the historical anchor is load-bearing.Decisions resolved during planning
src/synthorg/providers/presets.pyalready covers vendor-name allowance.unitqualifier; use27,000+ tests(the live total of 27,566 covers unit + integration + e2e + property tests).Test plan
uv run python scripts/generate_comparison.pyreruns produce a no-op diff against the committed file (apart fromLast updated:if the YAML hasn't been touched since).uv run python -m pytest tests/ --collect-only -q | tail -1reports27566 tests collected./pre-pr-review quickmode against the changed.mdfiles (D2 incommunication.md, Mermaid inREADME.md,communication.md,self-improvement.md); no syntax issues, no diagram bodies modified.Review coverage
Pre-reviewed via
/pre-pr-review quick(docs-only PR; only diagram-syntax-validator was required). Pre-push hooks gated mypy + pytest on touched Python files.Closes #1709