feat: research mode by Aureliolo · Pull Request #2041 · Aureliolo/synthorg

Aureliolo · 2026-05-22T14:58:04Z

Summary

Implements research mode (#1989): a real research subsystem for synthetic organisations. A research brief drives query planning -> multi-source retrieval (internal knowledge substrate + vendor-agnostic web / academic / code search) -> source-credibility triage -> deduplication -> citation-backed synthesis. Every run is recorded and deterministically replayable, and every claim in the deliverable resolves to a retrievable source.

New package src/synthorg/research/ mirrors the knowledge-substrate conventions. Built on the CLOSED dependencies: the knowledge+provenance substrate (#2036) and governed external access (#1991).

What's included

Pipeline (all pluggable: protocol + default strategy + factory + config discriminator):
- QueryPlanner (LlmQueryPlanner) decomposes the brief into source-targeted sub-queries.
- RetrievalSource x4 fanned out via asyncio.TaskGroup: knowledge (wraps KnowledgeService) + web / academic / code (vendor-agnostic provider protocols, no bundled impl, injected at runtime). Per-source failure is isolated.
- CredibilityTriage (HybridCredibilityTriage default: deterministic heuristic prefilter then LLM triage on survivors; pure heuristic / pure LLM also shipped).
- Deduplicator (LexicalDeduplicator default: content-hash + canonical-URL + token-shingle Jaccard; EmbeddingDeduplicator alternative).
- Synthesizer (LlmSynthesizer + CitationBinder): every claim cites sources by ref-id; an unsourced claim raises ResearchSynthesisError.
Recording & replay: the run record is the single source of truth for retrieval; LLM calls replay via the cassette provider and ReplayRetrievalSource serves recorded items, so a run is byte-identical on replay.
Persistence: research_runs table (JSON blob + denormalised filter columns), ResearchRunRepository over the generic categories, SQLite + Postgres impls, yoyo revisions per backend, dual-backend conformance test.
Surfaces: ResearchTool agent tool + MCP domain research:run / research:get / research:list; boot wiring via _wire_research_engine behind research.enabled + a configured provider/model; ghost-wiring manifest entries (ENFORCED).
Eval lane: kind="research" brief + ResearchBriefSpec, deterministic grade_research_run grader (claim coverage, citation resolution, source credibility), and a spine test that records -> replays -> grades. Plus docs/design/research-mode.md.
SEC-1: all untrusted retrieved content (snippet, title, uri) and the brief question are wrapped via wrap_untrusted() where they enter the planning / triage / synthesis prompts.

Test plan

tests/unit/research/** (models, planning, retrieval, triage, synthesis, service, tool), tests/unit/meta/mcp/test_research_handlers.py, tests/conformance/persistence/test_research_run_repository.py (dual-backend), tests/evals_spine/test_research_eval.py (record + byte-identical replay + grade).
Full pre-push gate suite green: ruff, mypy, unit suite, schema-drift (both gates), ghost-wiring, chokepoint, forbidden-literals, no-magic-numbers, dual-backend parity, MCP handler parity.

Review coverage

Pre-reviewed by 7 core agents (code-reviewer, python-reviewer, conventions-enforcer, security-reviewer, persistence-reviewer, test-quality-reviewer, issue-resolution-verifier). 5 findings addressed (SEC-1 prompt wrapping + test-assertion strengthening + a docstring); 7 rejected with reasons recorded in _audit/pre-pr-review/triage.md. issue-resolution-verifier returned PASS on all acceptance criteria.

Closes #1989

…e helper

…arch LLM helper, unique test basenames

…dening

github-actions · 2026-05-22T14:58:20Z

Dependency Review

The following issues were found:

✅ 0 vulnerable package(s)
✅ 0 package(s) with incompatible licenses
✅ 0 package(s) with invalid SPDX license definitions
⚠️ 1 package(s) with unknown licenses.

See the Details below.

License Issues

uv.lock

Package	Version	License	Issue Type
starlette	1.0.1	Null	Unknown License

Allowed Licenses:
MIT, MIT-0, Apache-2.0, BSD-2-Clause, BSD-3-Clause, ISC, MPL-2.0, PSF-2.0, Unlicense, 0BSD, CC0-1.0, CC-BY-3.0, CC-BY-4.0, Python-2.0, Python-2.0.1, LicenseRef-scancode-free-unknown, LicenseRef-scancode-protobuf, LicenseRef-scancode-google-patent-license-golang, ZPL-2.1, LGPL-2.0-only, LGPL-2.0-or-later, LGPL-2.1-only, LGPL-2.1-or-later, LGPL-3.0-only, LGPL-3.0-or-later, BlueOak-1.0.0, OFL-1.1

Excluded from license check:
pkg:pypi/mem0ai@2.0.1, pkg:pypi/numpy@2.4.4, pkg:pypi/qdrant-client@1.17.1, pkg:pypi/posthog@7.9.12, pkg:pypi/aiohttp@3.13.5, pkg:pypi/cyclonedx-python-lib@11.7.0, pkg:pypi/fsspec@2026.3.0, pkg:pypi/griffelib@2.0.2, pkg:pypi/grpcio@1.80.0, pkg:pypi/charset-normalizer@3.4.6, pkg:pypi/wrapt@2.1.2, pkg:pypi/pytest-codspeed@4.5.0, pkg:pypi/hypothesis@6.152.4, pkg:pypi/litellm@1.83.14, pkg:pypi/openai@2.33.0, pkg:pypi/pyngrok@8.1.2, pkg:pypi/tokenizers@0.23.1, pkg:pypi/typer@0.25.0, pkg:npm/@img/sharp-wasm32@0.33.5, pkg:npm/@img/sharp-win32-ia32@0.33.5, pkg:npm/@img/sharp-win32-x64@0.33.5, pkg:npm/json-schema-typed@8.0.2, pkg:npm/victory-vendor@37.3.6, pkg:pypi/scikit-learn@1.8.0, pkg:pypi/torch@2.11.0, pkg:pypi/cuda-bindings@13.2.0, pkg:pypi/cuda-pathfinder@1.5.0, pkg:pypi/cuda-toolkit@13.0.2, pkg:pypi/nvidia-cublas@13.1.0.3, pkg:pypi/nvidia-cuda-cupti@13.0.85, pkg:pypi/nvidia-cuda-nvrtc@13.0.88, pkg:pypi/nvidia-cuda-runtime@13.0.96, pkg:pypi/nvidia-cudnn-cu13@9.19.0.56, pkg:pypi/nvidia-cufft@12.0.0.61, pkg:pypi/nvidia-cufile@1.15.1.6, pkg:pypi/nvidia-curand@10.4.0.35, pkg:pypi/nvidia-cusolver@12.0.4.66, pkg:pypi/nvidia-cusparse@12.6.3.3, pkg:pypi/nvidia-cusparselt-cu13@0.8.0, pkg:pypi/nvidia-nccl-cu13@2.28.9, pkg:pypi/nvidia-nvjitlink@13.0.88, pkg:pypi/nvidia-nvshmem-cu13@3.4.5, pkg:pypi/nvidia-nvtx@13.0.85, pkg:pypi/pillow@12.2.0

OpenSSF Scorecard

Package	Version	Score	Details
pip/starlette	1.0.1	Unknown	Unknown

Scanned Files

uv.lock

coderabbitai · 2026-05-22T14:58:22Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Repository UI (base), Organization UI (inherited)

Review profile: ASSERTIVE

Plan: Pro

Run ID: 49d18a50-7340-45e4-b61e-56b7222f60b8

📥 Commits

Reviewing files that changed from the base of the PR and between f491a9c and 3ffd6c8.

⛔ Files ignored due to path filters (1)

uv.lock is excluded by !**/*.lock

📒 Files selected for processing (27)

evals/scoring/research.py
src/synthorg/api/app.py
src/synthorg/core/error_taxonomy.py
src/synthorg/meta/mcp/domains/research.py
src/synthorg/meta/mcp/handlers/research.py
src/synthorg/research/_llm.py
src/synthorg/research/errors.py
src/synthorg/research/models.py
src/synthorg/research/retrieval/dedup.py
src/synthorg/research/retrieval/sources/academic.py
src/synthorg/research/retrieval/sources/code.py
src/synthorg/research/service.py
src/synthorg/research/synthesis/llm_synthesizer.py
src/synthorg/research/tool.py
src/synthorg/research/triage/llm.py
src/synthorg/settings/definitions/research.py
tests/conformance/persistence/test_research_run_repository.py
tests/evals_spine/test_research_eval.py
tests/integration/mcp/test_tool_surface.py
tests/unit/meta/mcp/test_research_handlers.py
tests/unit/persistence/test_protocol.py
tests/unit/research/test_research_models.py
tests/unit/research/test_research_service.py
web/src/api/types/error-codes.gen.ts
web/src/api/types/openapi.gen.ts
web/src/pages/settings/utils.ts
web/src/utils/constants.ts

📜 Recent review details

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (16)

GitHub Check: Build Fine-Tune (gpu, fine-tune-gpu)
GitHub Check: Build Fine-Tune (cpu, fine-tune-cpu)
GitHub Check: Build Backend
GitHub Check: Build Web Assets (melange)
GitHub Check: CodSpeed Python benchmarks
GitHub Check: CodSpeed Web benchmarks
GitHub Check: Lighthouse Dashboard
GitHub Check: Lighthouse Site
GitHub Check: Dashboard Test
GitHub Check: Test Integration
GitHub Check: Test E2E
GitHub Check: Test Conformance (SQLite)
GitHub Check: Test Unit
GitHub Check: Build Preview
GitHub Check: Analyze (python)
GitHub Check: Analyze (javascript-typescript)

🧰 Additional context used

📓 Path-based instructions (11)

web/src/**/*.{js,jsx,ts,tsx,mts}

📄 CodeRabbit inference engine (web/CLAUDE.md)

web/src/**/*.{js,jsx,ts,tsx,mts}: Always use createLogger from @/lib/logger; never bare console.warn/console.error/console.debug in application code. Variable name must always be log. Only logger.ts itself may use bare console methods. Use log.debug() (DEV-only, stripped in production), log.warn(), log.error().
Pass dynamic/untrusted values as separate args to logger calls (not interpolated into the message string) so they go through sanitizeArg
Attacker-controlled fields inside structured objects must be wrapped in sanitizeForLog() before embedding in log calls
Error-code constants (MANDATORY): import ErrorCode and ErrorCategory from @/api/types/errors (re-exported from the generated web/src/api/types/error-codes.gen.ts). Discriminate on ErrorCode.<NAME>, never on raw integer literals.
Use @eslint-react/web-api-no-leaked-fetch to detect fetch() in effects without AbortController cleanup

Files:

web/src/api/types/openapi.gen.ts
web/src/pages/settings/utils.ts
web/src/utils/constants.ts
web/src/api/types/error-codes.gen.ts

web/src/api/types/**/*.gen.ts

📄 CodeRabbit inference engine (web/CLAUDE.md)

Generated DTO types (MANDATORY): NEVER hand-edit web/src/api/types/*.gen.ts. Regenerate with uv run python scripts/generate_dto_types_ts.py. Import DTOs via the barrel (import type { AgentConfig } from '@/api/types').

Files:

web/src/api/types/openapi.gen.ts
web/src/api/types/error-codes.gen.ts

web/src/**/*.{ts,tsx,mts}

📄 CodeRabbit inference engine (web/CLAUDE.md)

web/src/**/*.{ts,tsx,mts}: Use @typescript-eslint/no-floating-promises to forbid unawaited promises so async work cannot survive the test that scheduled it and trip the active-handle gate
Use @typescript-eslint/no-misused-promises (with checksVoidReturn: { attributes: false }) to forbid passing async functions where the callsite ignores the returned promise. React 19 async event handlers stay allowed via the attributes: false exemption.

Files:

web/src/api/types/openapi.gen.ts
web/src/pages/settings/utils.ts
web/src/utils/constants.ts
web/src/api/types/error-codes.gen.ts

**/*.{py,ts,tsx,jsx,md}

📄 CodeRabbit inference engine (CLAUDE.md)

No region/currency/locale privileged; use metric units; British English per docs/reference/regional-defaults.md

Files:

web/src/api/types/openapi.gen.ts
web/src/pages/settings/utils.ts
web/src/utils/constants.ts
tests/unit/persistence/test_protocol.py
tests/unit/meta/mcp/test_research_handlers.py
web/src/api/types/error-codes.gen.ts
tests/integration/mcp/test_tool_surface.py
src/synthorg/meta/mcp/domains/research.py
src/synthorg/research/errors.py
src/synthorg/settings/definitions/research.py
tests/evals_spine/test_research_eval.py
src/synthorg/core/error_taxonomy.py
tests/conformance/persistence/test_research_run_repository.py
src/synthorg/research/_llm.py
src/synthorg/research/retrieval/sources/code.py
evals/scoring/research.py
src/synthorg/research/retrieval/sources/academic.py
src/synthorg/research/synthesis/llm_synthesizer.py
src/synthorg/research/tool.py
src/synthorg/meta/mcp/handlers/research.py
tests/unit/research/test_research_service.py
src/synthorg/research/retrieval/dedup.py
src/synthorg/research/service.py
src/synthorg/research/triage/llm.py
tests/unit/research/test_research_models.py
src/synthorg/research/models.py
src/synthorg/api/app.py

web/src/utils/constants.ts

📄 CodeRabbit inference engine (web/CLAUDE.md)

WS wire protocol (MANDATORY): the client-server contract lives in web/src/utils/constants.ts (WS_PROTOCOL_VERSION, WS_MAX_MESSAGE_SIZE, WS_HEARTBEAT_INTERVAL_MS, WS_PONG_TIMEOUT_MS, LOG_SANITIZE_MAX_LENGTH) and MUST stay in lockstep with src/synthorg/api/ws_models.py / src/synthorg/api/controllers/ws.py. Bump the protocol version on both sides together for breaking payload changes.

Files:

web/src/utils/constants.ts

web/src/{components,utils}/**/*.{ts,tsx}

📄 CodeRabbit inference engine (web/CLAUDE.md)

NEVER write getXIcon(value): LucideIcon factories called inside JSX bodies. Export a <XIcon value={...} /> wrapper that does the lookup via createElement inside the wrapper body. Wrapper components live in their own file, not alongside utility exports.

Files:

web/src/utils/constants.ts

tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Timeout/slow failures = source-code regression; never edit tests/baselines/unit_timing.json or any scripts/*_baseline.{txt,json} / scripts/_*_baseline.py; both families PreToolUse-blocked; per-invocation bypass requires explicit approval (ALLOW_BASELINE_GROWTH=1 git commit)
Test markers: @pytest.mark.{unit,integration,e2e,slow}; async auto; timeout 30s global; coverage 80% min
xdist -n 8 --dist=loadfile auto-applied via pyproject addopts; Windows unit tests use WindowsSelectorEventLoopPolicy; subprocess tests override back
Test doubles: FakeClock for Clock seam, mock_of[T](**overrides) for typed-boundary substitutions, SimpleNamespace for attribute-bags; bare MagicMock at typed boundary blocked by scripts/check_mock_spec.py
Hypothesis: 10 deterministic CI examples; failures are real bugs (fix + add @example(...)); never skip/xfail flaky tests; fix fundamentally

Files:

tests/unit/persistence/test_protocol.py
tests/unit/meta/mcp/test_research_handlers.py
tests/integration/mcp/test_tool_surface.py
tests/evals_spine/test_research_eval.py
tests/conformance/persistence/test_research_run_repository.py
tests/unit/research/test_research_service.py
tests/unit/research/test_research_models.py

⚙️ CodeRabbit configuration file

Test files do not require Google-style docstrings on classes or functions -- ruff D rules are only enforced on src/. A bare @settings() decorator with no arguments on Hypothesis property tests is a no-op and should not be suggested -- the HYPOTHESIS_PROFILE env var controls example counts via registered profiles, which @given() honors automatically.

Files:

tests/unit/persistence/test_protocol.py
tests/unit/meta/mcp/test_research_handlers.py
tests/integration/mcp/test_tool_surface.py
tests/evals_spine/test_research_eval.py
tests/conformance/persistence/test_research_run_repository.py
tests/unit/research/test_research_service.py
tests/unit/research/test_research_models.py

src/synthorg/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/synthorg/**/*.py: Only src/synthorg/persistence/ may import sqlite/psycopg or emit raw SQL; new repository protocols inherit from generic categories in persistence/_generics.py; bespoke methods permitted only under ADR-0001 D7
Configuration Precedence: DB > env > code default via SettingsService/ConfigResolver (Cat-1) or env > code default (Cat-2, read_only_post_init); Cat-3 bootstrap secrets pure env; YAML is ingestion format only, not precedence tier; no os.environ.get outside startup
No hardcoded numeric values; numerics live in settings/definitions/; allowlist only 0/1/-1, HTTP codes, hex masks, powers-of-2, and module-level annotated named constants (NAME: int|float|Final); enforced by scripts/check_no_magic_numbers.py
Comments document WHY only; no reviewer citations, issue back-refs, or migration framing; enforced by check_no_review_origin_in_code.py + check_no_migration_framing.py
No from __future__ import annotations (Python 3.14 has PEP 649); use PEP 758 except: except A, B: no parens unless binding
Type hints on public functions; mypy strict; Google-style docstrings; line length 88; functions <50 lines; files <800 lines
Errors follow <Domain><Condition>Error pattern from DomainError; never inherit Exception/RuntimeError directly; enforced by check_domain_error_hierarchy.py
Pydantic v2 frozen + extra="forbid" on every frozen model project-wide; gate check_frozen_model_extra_forbid.py; @computed_field auto-exempt; per-line # lint-allow: frozen-extra-forbid -- <reason> for extra="allow"/"ignore" boundaries; use @computed_field for derived; use NotBlankStr for identifiers
Args models at every system boundary; parse_typed() for every external dict ingestion; enforced by check_boundary_typed.py
Immutability: use model_copy(update=...) or copy.deepcopy(); deepcopy at system boundaries
Async: use asyncio.TaskGroup for fan-out/fan-in; helpers catch Exception (re-raise MemoryError/`RecursionError...

Files:

src/synthorg/meta/mcp/domains/research.py
src/synthorg/research/errors.py
src/synthorg/settings/definitions/research.py
src/synthorg/core/error_taxonomy.py
src/synthorg/research/_llm.py
src/synthorg/research/retrieval/sources/code.py
src/synthorg/research/retrieval/sources/academic.py
src/synthorg/research/synthesis/llm_synthesizer.py
src/synthorg/research/tool.py
src/synthorg/meta/mcp/handlers/research.py
src/synthorg/research/retrieval/dedup.py
src/synthorg/research/service.py
src/synthorg/research/triage/llm.py
src/synthorg/research/models.py
src/synthorg/api/app.py

src/synthorg/meta/mcp/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

MCP: Define ToolHandler + args_model; call require_admin_guardrails() on admin tools; route through service layers per mcp-handler-contract.md

Files:

src/synthorg/meta/mcp/domains/research.py
src/synthorg/meta/mcp/handlers/research.py

src/**/*.py

⚙️ CodeRabbit configuration file

This project uses Python 3.14+ with PEP 758 except syntax: "except A, B:" (comma-separated, no parentheses) is correct and mandatory -- do NOT flag it as a typo or suggest parenthesized form. The "except builtins.MemoryError, RecursionError: raise" pattern is intentional project convention for system-error propagation. When evaluating the 50-line function limit, count only the function body excluding the signature lines, decorators, and docstring. Functions 1-5 lines over due to docstrings or multi-line signatures should not be flagged. Do not suggest extracting single-use helper functions called exactly once -- this reduces readability without improving maintainability.

Files:

src/synthorg/meta/mcp/domains/research.py
src/synthorg/research/errors.py
src/synthorg/settings/definitions/research.py
src/synthorg/core/error_taxonomy.py
src/synthorg/research/_llm.py
src/synthorg/research/retrieval/sources/code.py
src/synthorg/research/retrieval/sources/academic.py
src/synthorg/research/synthesis/llm_synthesizer.py
src/synthorg/research/tool.py
src/synthorg/meta/mcp/handlers/research.py
src/synthorg/research/retrieval/dedup.py
src/synthorg/research/service.py
src/synthorg/research/triage/llm.py
src/synthorg/research/models.py
src/synthorg/api/app.py

tests/conformance/persistence/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Dual-backend conformance: tests/conformance/persistence/ consumes backend fixture (SQLite + Postgres); enforced by check_dual_backend_test_parity.py

Files:

tests/conformance/persistence/test_research_run_repository.py

🧠 Learnings (5)

📚 Learning: 2026-05-05T09:04:46.195Z

Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1760
File: scripts/_dual_backend_parity_lib.py:215-216
Timestamp: 2026-05-05T09:04:46.195Z
Learning: This repository targets Python 3.14+ and follows PEP 758. Therefore, reviewer tooling should NOT treat unparenthesized multi-exception `except` clauses written without an `as` clause (e.g., `except MemoryError, RecursionError:`) as syntax errors. Only flag `except`-clause problems when they are genuinely invalid for Python 3.14+.

Applied to files:

tests/unit/persistence/test_protocol.py
tests/unit/meta/mcp/test_research_handlers.py
tests/integration/mcp/test_tool_surface.py
src/synthorg/meta/mcp/domains/research.py
src/synthorg/research/errors.py
src/synthorg/settings/definitions/research.py
tests/evals_spine/test_research_eval.py
src/synthorg/core/error_taxonomy.py
tests/conformance/persistence/test_research_run_repository.py
src/synthorg/research/_llm.py
src/synthorg/research/retrieval/sources/code.py
evals/scoring/research.py
src/synthorg/research/retrieval/sources/academic.py
src/synthorg/research/synthesis/llm_synthesizer.py
src/synthorg/research/tool.py
src/synthorg/meta/mcp/handlers/research.py
tests/unit/research/test_research_service.py
src/synthorg/research/retrieval/dedup.py
src/synthorg/research/service.py
src/synthorg/research/triage/llm.py
tests/unit/research/test_research_models.py
src/synthorg/research/models.py
src/synthorg/api/app.py

📚 Learning: 2026-05-21T22:55:20.496Z

Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 2035
File: src/synthorg/meta/toolsmith/models.py:114-114
Timestamp: 2026-05-21T22:55:20.496Z
Learning: In this repo’s “magic number” review standard, the existing gate in `scripts/check_no_magic_numbers.py` intentionally does NOT flag numeric literals used as raw call-site arguments. So, do not flag numeric literals passed as keyword arguments to Pydantic `Field()` (e.g., `Field(ge=0, le=100)` / `Field(ge=1, le=50)`)—this is an established idiom. Only treat numeric literals as “magic numbers” when they occur in the locations the gate checks (module-level assignments and function/method parameter defaults).

Applied to files:

tests/unit/persistence/test_protocol.py
tests/unit/meta/mcp/test_research_handlers.py
tests/integration/mcp/test_tool_surface.py
src/synthorg/meta/mcp/domains/research.py
src/synthorg/research/errors.py
src/synthorg/settings/definitions/research.py
tests/evals_spine/test_research_eval.py
src/synthorg/core/error_taxonomy.py
tests/conformance/persistence/test_research_run_repository.py
src/synthorg/research/_llm.py
src/synthorg/research/retrieval/sources/code.py
evals/scoring/research.py
src/synthorg/research/retrieval/sources/academic.py
src/synthorg/research/synthesis/llm_synthesizer.py
src/synthorg/research/tool.py
src/synthorg/meta/mcp/handlers/research.py
tests/unit/research/test_research_service.py
src/synthorg/research/retrieval/dedup.py
src/synthorg/research/service.py
src/synthorg/research/triage/llm.py
tests/unit/research/test_research_models.py
src/synthorg/research/models.py
src/synthorg/api/app.py

📚 Learning: 2026-05-21T22:55:09.289Z

Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 2035
File: src/synthorg/meta/toolsmith/config.py:29-30
Timestamp: 2026-05-21T22:55:09.289Z
Learning: For this repo’s Pydantic configuration idiom, do not treat numeric literals passed directly as arguments to `pydantic.Field(...)` as “magic numbers” during review. This includes call-site usages like `Field(default=0.2, ge=0.0, le=1.0)` (e.g., in config models such as `ToolAuthoringConfig`, `ToolValidationConfig`, `ToolsmithConfig`). Do not request extracting those `Field(...)` numeric arguments into named constants, since the repo’s `scripts/check_no_magic_numbers.py` intentionally excludes call-site `Field(...)` numerics and relies on `Field(...)` as the canonical way to express these constraints/defaults.

Applied to files:

src/synthorg/meta/mcp/domains/research.py
src/synthorg/research/errors.py
src/synthorg/settings/definitions/research.py
src/synthorg/core/error_taxonomy.py
src/synthorg/research/_llm.py
src/synthorg/research/retrieval/sources/code.py
src/synthorg/research/retrieval/sources/academic.py
src/synthorg/research/synthesis/llm_synthesizer.py
src/synthorg/research/tool.py
src/synthorg/meta/mcp/handlers/research.py
src/synthorg/research/retrieval/dedup.py
src/synthorg/research/service.py
src/synthorg/research/triage/llm.py
src/synthorg/research/models.py
src/synthorg/api/app.py

📚 Learning: 2026-05-17T11:45:11.839Z

Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1952
File: src/synthorg/settings/definitions/api.py:594-638
Timestamp: 2026-05-17T11:45:11.839Z
Learning: In SynthOrg (Aureliolo/synthorg) pre-alpha, apply the strict no-backward-compat policy: any setting-key rename must be fully completed in the same change/PR with all repo callers updated, and you should not keep legacy aliases or compatibility fallbacks. When reviewing, do not flag a setting-key rename as a breaking upgrade hazard if the rename is repo-wide and fully implemented within the same PR.

Applied to files:

src/synthorg/settings/definitions/research.py

📚 Learning: 2026-05-17T11:45:11.839Z

Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1952
File: src/synthorg/settings/definitions/api.py:594-638
Timestamp: 2026-05-17T11:45:11.839Z
Learning: In this repository, SynthOrg is pre-alpha and uses a strict no-backward-compat policy for setting-key renames. When reviewing code under src/synthorg/settings, do NOT flag a setting-key rename as an “upgrade-safety” issue if the rename is complete/atomic in the same PR: all callers/usages of the old key are updated simultaneously, and the PR does not keep any legacy aliases, compatibility fallbacks, or migration/rollback paths for the old key.

Applied to files:

src/synthorg/settings/definitions/research.py

🔇 Additional comments (27)

web/src/api/types/openapi.gen.ts (1)

8048-8048: LGTM!

web/src/pages/settings/utils.ts (1)

30-30: LGTM!

web/src/utils/constants.ts (1)

147-147: LGTM!

Also applies to: 174-174

tests/unit/persistence/test_protocol.py (1)

52-52: LGTM!

Also applies to: 74-74, 1170-1173, 1501-1505

tests/unit/meta/mcp/test_research_handlers.py (1)

89-91: LGTM!

web/src/api/types/error-codes.gen.ts (1)

81-81: LGTM!

tests/integration/mcp/test_tool_surface.py (1)

398-398: LGTM!

Also applies to: 400-400, 402-402

src/synthorg/meta/mcp/domains/research.py (1)

11-11: LGTM!

Also applies to: 23-25

src/synthorg/research/errors.py (1)

85-97: LGTM!

src/synthorg/settings/definitions/research.py (1)

27-27: LGTM!

tests/evals_spine/test_research_eval.py (1)

34-37: LGTM!

Also applies to: 215-215

src/synthorg/core/error_taxonomy.py (1)

137-137: LGTM!

tests/conformance/persistence/test_research_run_repository.py (1)

171-175: LGTM!

src/synthorg/research/_llm.py (1)

8-8: LGTM!

Also applies to: 22-23, 27-32, 37-37, 40-40, 42-52

src/synthorg/research/retrieval/sources/code.py (1)

57-59: LGTM!

evals/scoring/research.py (1)

27-27: LGTM!

Also applies to: 47-47

src/synthorg/research/retrieval/sources/academic.py (1)

57-59: LGTM!

src/synthorg/research/synthesis/llm_synthesizer.py (1)

129-130: LGTM!

Also applies to: 133-135, 144-144

src/synthorg/research/tool.py (1)

86-86: LGTM!

src/synthorg/meta/mcp/handlers/research.py (1)

76-84: LGTM!

Also applies to: 100-100

tests/unit/research/test_research_service.py (1)

21-21: LGTM!

Also applies to: 26-29, 183-187, 201-204, 235-255

src/synthorg/research/retrieval/dedup.py (1)

39-50: LGTM!

src/synthorg/research/service.py (1)

34-38: LGTM!

Also applies to: 106-107, 124-135, 167-167, 180-180, 199-199, 235-245, 279-285

src/synthorg/research/triage/llm.py (1)

61-63: LGTM!

tests/unit/research/test_research_models.py (1)

313-324: LGTM!

src/synthorg/research/models.py (1)

501-504: LGTM!

src/synthorg/api/app.py (1)

1472-1474: LGTM!

Also applies to: 1485-1487, 1499-1499, 1509-1519

Walkthrough

Adds a deterministic “Research Mode” pipeline: models, planner, multi-source retrieval (knowledge/web/academic/code), credibility triage (heuristic/LLM/hybrid), deduplication, and LLM synthesis with citation binding. Wires app startup, settings, security/risk, observability, and MCP/agent tools. Implements Postgres/SQLite repositories and migrations for research_runs. Extends enums and error codes. Provides replay mechanics and deterministic grading. Includes comprehensive unit/integration/eval tests validating persistence, replay, MCP handlers, tools, and byte-stable outputs.

gemini-code-assist · 2026-05-22T14:58:34Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request implements a comprehensive research subsystem for synthetic organizations. It transforms research from a simple sandbox action into a robust, recorded, and replayable pipeline. By integrating internal knowledge with external search providers, the system provides citation-backed reports where every claim is verified against retrievable sources. The implementation includes full persistence, evaluation capabilities, and security measures for handling untrusted external content.

Highlights

Research Mode Implementation: Introduced a new research subsystem (src/synthorg/research/) that enables agents to perform multi-source research (internal knowledge, web, academic, and code search) with citation-backed synthesis.
Deterministic Replayability: Every research run is recorded and deterministically replayable, ensuring that identical requests produce byte-identical reports by replaying LLM calls and serving retrieval results from the run record.
Pipeline Architecture: Implemented a pluggable pipeline consisting of Query Planning, Multi-source Retrieval, Credibility Triage, Deduplication, and Synthesis, all configurable via ResearchConfig.
Persistence and Evaluation: Added a research_runs table with dual-backend (SQLite/Postgres) support and a dedicated evaluation lane that grades runs on claim coverage, citation resolution, and source credibility.
Security and Surfaces: Implemented SEC-1 prompt wrapping for untrusted retrieved content and exposed the research capability via an agent tool and MCP domain (research:run, research:get, research:list).

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize the Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counterproductive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

codspeed-hq · 2026-05-22T15:00:16Z

Merging this PR will not alter performance

✅ 54 untouched benchmarks

_{Comparing feat/1989-research-mode (3ffd6c8) with main (7527078)}

gemini-code-assist

Code Review

This pull request implements "Research Mode," a new subsystem enabling synthetic organizations to execute complex research tasks through a pipeline of query planning, multi-source retrieval (internal knowledge, web, academic, and code), credibility triage, and citation-backed synthesis. The implementation includes the core ResearchService, Pydantic data models, persistence repositories for Postgres and SQLite, and integration with the agent tool layer and MCP. A review finding notes that while max_cost and max_wall_clock_seconds limits are defined in the research brief, they are not currently enforced by the orchestrator during execution, presenting a risk of runaway costs or latency.

gemini-code-assist · 2026-05-22T15:04:38Z

+    async def _execute(
+        self,
+        run: ResearchRun,
+        brief: ResearchBrief,
+        started_at: datetime,
+    ) -> ResearchRun:


The research pipeline does not enforce the max_cost or max_wall_clock_seconds limits defined in the ResearchBrief. While these values are recorded in the final ResearchRun, the orchestrator should actively monitor and enforce them during execution to prevent runaway costs or latency.

Consider wrapping the _execute body in an asyncio.timeout block and checking the accumulated total_cost after each LLM-backed stage (planning, triage, synthesis).

coderabbitai

Actionable comments posted: 18

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@evals/scoring/research.py`:
- Around line 27-47: The tokenizer is ASCII-only (_TOKEN_RE) and misses
non‑ASCII words; update tokenization to be Unicode-aware by changing _TOKEN_RE
to use a Unicode word pattern (e.g. re.compile(r"\w+", re.UNICODE)) and update
_tokens(text: str) to filter out non-alphanumeric-only matches (use
token.isalnum() or any(char.isalnum() for char in token)) before returning
frozenset so underscores are excluded but Unicode letters/numbers are included;
adjust references in any coverage logic (e.g. COVERAGE_TOKEN_OVERLAP) remain
unchanged.

In `@src/synthorg/api/app.py`:
- Around line 1472-1486: The guard is using the captured
settings_service/provider_registry args (from create_app) which are often None;
change the wiring to read the live AppState services instead. Replace uses of
the local settings_service and provider_registry in the research wiring block
with the app/state's runtime services (e.g., AppState.settings_service or
app.state.settings_service and the runtime provider_registry) before returning,
and then proceed to import and call
build_research_service/build_research_tool_factory as shown so research is wired
during the default boot path.

In `@src/synthorg/meta/mcp/domains/research.py`:
- Around line 22-30: Replace the hardcoded _STATUSES list with values derived
from the existing ResearchRunStatus enum so the MCP domain cannot drift;
specifically import or reference ResearchRunStatus and build _STATUSES =
[status.value for status in ResearchRunStatus] (or equivalent) and ensure any
validation in ResearchListArgs uses ResearchRunStatus directly (or the same
derived list) instead of repeating the string literals; update references to
_STATUSES and ResearchListArgs validation to use the single source of truth
ResearchRunStatus.

In `@src/synthorg/meta/mcp/handlers/research.py`:
- Line 90: Replace the inline SystemClock() usage with a clock seam: add an
optional parameter like clock: Clock | None = None to the handler/function that
sets created_at, and use (clock or SystemClock()).now() (or clock.now() when
clock is provided) to populate created_at instead of SystemClock().now(); update
the call sites in the same code path to pass a test-injectable FakeClock in
tests so created_at becomes deterministic.

In `@src/synthorg/research/_llm.py`:
- Around line 21-42: The current greedy regex _JSON_OBJECT_RE and
extract_json_object are capturing from the first "{" to the last "}" and can
return an invalid span when extra braces appear outside the JSON; replace the
regex-based approach in extract_json_object with a deterministic brace-matching
scan: locate the first "{" in content, iterate forward maintaining a nesting
counter (increment on "{", decrement on "}"), return the substring from the
first "{" to the position where the counter returns to zero, and raise
ValueError if the end of content is reached before the counter closes; refer to
extract_json_object and _JSON_OBJECT_RE while making this change and
remove/disable the greedy regex usage.

In `@src/synthorg/research/models.py`:
- Around line 501-503: The validator only enforces that a FAILED run has an
error but not that it has a completion timestamp; update the check on
self.status == ResearchRunStatus.FAILED (the block that currently raises
ValueError when self.error is None) to also require self.completed_at to be set
and raise a clear ValueError if either error or completed_at is missing so
FAILED terminal runs must include both self.error and self.completed_at.

In `@src/synthorg/research/retrieval/dedup.py`:
- Around line 38-43: The _canonical_url function currently lowercases the entire
URI via normalize_ascii_lowercase(uri), which can collapse case-sensitive paths;
instead parse the original uri with urlsplit(uri), lowercase only the scheme and
netloc (use normalize_ascii_lowercase on parts.netloc and parts.scheme if
needed) while preserving parts.path exactly, then build the canonical host+path
string as f"{lower_netloc}{parts.path}".rstrip("/") (dropping query/fragment as
before) and return that or the original input when empty; update references in
_canonical_url to use urlsplit on the raw uri and only lowercase the netloc (and
optionally scheme) rather than the full URI.

In `@src/synthorg/research/retrieval/sources/_shared.py`:
- Around line 36-45: positional_relevance currently treats position as
zero-based so the documented "first result scores 1.0" is wrong for one-based
ranks; update the function positional_relevance to treat position as a one-based
rank by computing relevance as max(0.0, (total - position + 1) / total) (and
keep the existing guard for total <= 0), and ensure the docstring matches the
one-based semantics and that out-of-range positions are clamped to the [0.0,1.0]
range.

In `@src/synthorg/research/retrieval/sources/academic.py`:
- Around line 57-75: Guard against empty/malformed URIs before constructing
NotBlankStr to avoid failing the whole retrieval: compute the raw uri from
result.url or result.identifier, check if it's truthy after strip (e.g., if not
uri_raw or not uri_raw.strip(): continue), and only then set uri =
uri_raw.strip() and proceed to build AcademicSourceLocator, RetrievedItem and
call items.append; reference the variables/functions NotBlankStr, RetrievedItem,
AcademicSourceLocator, uri and result.url/result.identifier to locate the
change.

In `@src/synthorg/research/retrieval/sources/code.py`:
- Around line 57-76: The code currently constructs uri then passes it to
NotBlankStr(uri) when building a RetrievedItem, which raises on blank URIs and
aborts the whole source call; change the logic in the loop that builds
ResearchCitation/ RetrievedItem (the block that sets uri, citation, and calls
items.append) to validate uri first and skip the current result if uri is
empty/whitespace (e.g., check uri.strip() or equivalent before creating
NotBlankStr), so other valid rows for the same sub_query.index are preserved;
keep using ResearchCitation, CodeSourceLocator and RetrievedItem as before for
valid rows.

In `@src/synthorg/research/service.py`:
- Around line 218-224: The TaskGroup may wrap system exceptions from
_safe_retrieve into a BaseExceptionGroup so callers (like run()) can't match
MemoryError/RecursionError; modify _retrieve() to catch BaseExceptionGroup after
the TaskGroup completes, inspect its .exceptions (and nested groups) for any
MemoryError or RecursionError and re-raise that original exception (or re-raise
the BaseExceptionGroup if none match). Reference _retrieve(), _safe_retrieve(),
and run() when making the change so the handler in run() can observe raw system
errors instead of a wrapped BaseExceptionGroup.

In `@src/synthorg/research/synthesis/llm_synthesizer.py`:
- Around line 131-143: The returned prompt includes model-produced text
plan.research_angle raw; update the synthesis assembly to fence that value with
wrap_untrusted(TAG_TASK_DATA, plan.research_angle) before injecting it into the
prompt so SEC-1 boundaries are preserved—modify the return expression that
currently builds f"{question}\nResearch angle:
{plan.research_angle}\n\nSources:\n" to use the wrapped research angle and keep
existing question and blocks handling (references: variables question,
plan.research_angle, blocks and function/module llm_synthesizer.py; use
wrap_untrusted from engine.prompt_safety).

In `@src/synthorg/research/tool.py`:
- Line 86: The run_id construction currently uses f"{args.min_credibility:.4f}"
which lossy-rounds args.min_credibility and can collapse distinct requests;
change the run_id assembly (where f"{args.min_credibility:.4f}" is used) to
serialize min_credibility deterministically without rounding — e.g., use
repr(args.min_credibility) or format(args.min_credibility, ".17g") or convert to
a Decimal and use its exact string — so distinct float values remain unique and
do not overwrite persisted runs.

In `@src/synthorg/research/triage/llm.py`:
- Around line 59-64: Validate the batch_size parameter in the class __init__
(where batch_size default is RESEARCH_TRIAGE_BATCH_SIZE) before assigning to
self._batch_size: ensure it is an int and greater than 0, and raise a ValueError
with a clear message if not (to prevent range(..., 0) and negative behavior).
Update the constructor logic that currently sets self._batch_size = batch_size
to perform this check (referencing the batch_size parameter, self._batch_size,
and RESEARCH_TRIAGE_BATCH_SIZE).

In `@src/synthorg/settings/definitions/research.py`:
- Around line 15-27: The SettingDefinition for the research master switch
(namespace SettingNamespace.RESEARCH, key "enabled") must be marked
restart-required so changes are not treated as hot-configurable; update the
SettingDefinition instance for that key to include the restart-required metadata
(e.g., set the restart_required flag/property to true or the appropriate enum
value used by SettingDefinition) so runtime edits are recognized as requiring a
restart to take effect.

In `@tests/evals_spine/test_research_eval.py`:
- Around line 198-214: The test's assertion that replay_web.queries is empty is
ineffective because replay_web (FakeWebSearchProvider) is never injected into
the replay run; the replay uses replay_sources produced by
build_replay_sources(recorded.retrieved_items) instead. Fix by wiring the
FakeWebSearchProvider into the replay path: update the replay_sources or the
call to _build_service/_build_service(...).run so that the web provider used
during the replay is replay_web (e.g., include replay_web under the appropriate
ResearchSourceType key in replay_sources or pass replay_web into the service
factory), so any accidental live web queries will be recorded in replay_web.
Ensure you reference and modify replay_web, replay_sources,
build_replay_sources, and the _build_service(...) invocation.

In `@tests/unit/persistence/test_protocol.py`:
- Around line 1167-1173: The research_runs property currently returns object()
which only checks attribute presence; replace it with a protocol-shaped fake
that implements the research repository interface (i.e., mimic the expected
methods/attributes of the research repo) so the isinstance/runtime_checkable
contract verifies shape, and update the conformance suite by adding a
backend-routed assertion in TestProtocolCompliance to ensure the
PersistenceBackend routes to the new research repository path correctly; locate
the research_runs property and the TestProtocolCompliance test to implement the
fake and add the assertion respectively.

In `@tests/unit/research/test_research_service.py`:
- Around line 173-199: The test creates a fake web provider (replay_web) but
never injects it into the service, so the final assertion about replay isolation
is vacuous; update the test to pass replay_web into the service construction so
ResearchService uses the fake web provider during replay. Concretely, change the
call to _build_service (or adjust its call-site) to accept and forward
replay_web (or a named parameter like web_provider) so that the instantiated
ResearchService (or the helper that builds it) uses replay_web instead of the
real web provider; keep the final assertion assert replay_web.queries == []
unchanged.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI (base), Organization UI (inherited)

Review profile: ASSERTIVE

Plan: Pro

Run ID: 51d2e90a-cc11-44df-9229-adabd1ceb492

📥 Commits

Reviewing files that changed from the base of the PR and between 7527078 and f491a9c.

📒 Files selected for processing (92)

docs/design/research-mode.md
evals/models/brief.py
evals/scoring/research.py
scripts/_ghost_wiring_manifest.txt
scripts/check_no_ghost_wiring.py
scripts/check_provider_complete_chokepoint.py
src/synthorg/api/app.py
src/synthorg/api/state.py
src/synthorg/core/enums.py
src/synthorg/core/error_taxonomy.py
src/synthorg/engine/prompt_safety.py
src/synthorg/meta/mcp/domains/__init__.py
src/synthorg/meta/mcp/domains/_research_args.py
src/synthorg/meta/mcp/domains/research.py
src/synthorg/meta/mcp/handlers/__init__.py
src/synthorg/meta/mcp/handlers/research.py
src/synthorg/observability/events/persistence.py
src/synthorg/observability/events/research.py
src/synthorg/observability/prometheus_labels.py
src/synthorg/persistence/postgres/backend.py
src/synthorg/persistence/postgres/research_run_repo.py
src/synthorg/persistence/postgres/revisions/20260522000002_research_runs.sql
src/synthorg/persistence/postgres/schema.sql
src/synthorg/persistence/protocol.py
src/synthorg/persistence/research_protocol.py
src/synthorg/persistence/sqlite/_backend_accessors.py
src/synthorg/persistence/sqlite/backend.py
src/synthorg/persistence/sqlite/research_run_repo.py
src/synthorg/persistence/sqlite/revisions/20260522000002_research_runs.sql
src/synthorg/persistence/sqlite/schema.sql
src/synthorg/research/__init__.py
src/synthorg/research/_args.py
src/synthorg/research/_llm.py
src/synthorg/research/config.py
src/synthorg/research/constants.py
src/synthorg/research/errors.py
src/synthorg/research/factory.py
src/synthorg/research/models.py
src/synthorg/research/planning/__init__.py
src/synthorg/research/planning/llm_planner.py
src/synthorg/research/planning/protocol.py
src/synthorg/research/retrieval/__init__.py
src/synthorg/research/retrieval/dedup.py
src/synthorg/research/retrieval/protocol.py
src/synthorg/research/retrieval/providers.py
src/synthorg/research/retrieval/replay.py
src/synthorg/research/retrieval/sources/__init__.py
src/synthorg/research/retrieval/sources/_shared.py
src/synthorg/research/retrieval/sources/academic.py
src/synthorg/research/retrieval/sources/code.py
src/synthorg/research/retrieval/sources/knowledge.py
src/synthorg/research/retrieval/sources/web.py
src/synthorg/research/service.py
src/synthorg/research/synthesis/__init__.py
src/synthorg/research/synthesis/citation_binder.py
src/synthorg/research/synthesis/llm_synthesizer.py
src/synthorg/research/synthesis/protocol.py
src/synthorg/research/tool.py
src/synthorg/research/tool_factory.py
src/synthorg/research/triage/__init__.py
src/synthorg/research/triage/heuristic.py
src/synthorg/research/triage/hybrid.py
src/synthorg/research/triage/llm.py
src/synthorg/research/triage/protocol.py
src/synthorg/security/action_types.py
src/synthorg/security/risk_scorer.py
src/synthorg/security/rules/risk_classifier.py
src/synthorg/security/timeout/risk_tier_classifier.py
src/synthorg/settings/definitions/__init__.py
src/synthorg/settings/definitions/research.py
src/synthorg/settings/enums.py
tests/conformance/persistence/test_research_run_repository.py
tests/evals/prompt/test_agent_system_prompt.py
tests/evals_spine/test_research_eval.py
tests/unit/api/fakes_backend.py
tests/unit/core/test_enums.py
tests/unit/meta/mcp/test_all_handlers_wired.py
tests/unit/meta/mcp/test_research_handlers.py
tests/unit/observability/test_events.py
tests/unit/persistence/test_protocol.py
tests/unit/research/_fakes.py
tests/unit/research/test_planning.py
tests/unit/research/test_research_models.py
tests/unit/research/test_research_retrieval.py
tests/unit/research/test_research_service.py
tests/unit/research/test_synthesis.py
tests/unit/research/test_tool.py
tests/unit/research/test_triage.py
tests/unit/security/test_action_types.py
web/src/api/types/enum-values.gen.ts
web/src/api/types/error-codes.gen.ts
web/src/api/types/openapi.gen.ts

📜 Review details

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)

GitHub Check: Build Backend
GitHub Check: Lighthouse Site
GitHub Check: CodSpeed Web benchmarks
GitHub Check: CodSpeed Python benchmarks
GitHub Check: Test Conformance (SQLite)
GitHub Check: Dashboard Test
GitHub Check: Test Unit
GitHub Check: Test E2E
GitHub Check: Test Integration
GitHub Check: Build Preview
GitHub Check: Analyze (javascript-typescript)
GitHub Check: Analyze (python)

🧰 Additional context used

📓 Path-based instructions (12)

**/*.{py,ts,tsx,jsx,md}

📄 CodeRabbit inference engine (CLAUDE.md)

No region/currency/locale privileged; use metric units; British English per docs/reference/regional-defaults.md

Files:

src/synthorg/research/synthesis/__init__.py
tests/evals/prompt/test_agent_system_prompt.py
src/synthorg/settings/definitions/__init__.py
src/synthorg/research/triage/__init__.py
tests/unit/observability/test_events.py
src/synthorg/settings/definitions/research.py
src/synthorg/meta/mcp/domains/research.py
src/synthorg/security/risk_scorer.py
src/synthorg/research/retrieval/__init__.py
src/synthorg/engine/prompt_safety.py
web/src/api/types/enum-values.gen.ts
src/synthorg/observability/events/research.py
src/synthorg/research/retrieval/sources/__init__.py
src/synthorg/research/__init__.py
web/src/api/types/openapi.gen.ts
src/synthorg/observability/events/persistence.py
tests/unit/security/test_action_types.py
src/synthorg/persistence/protocol.py
src/synthorg/meta/mcp/handlers/__init__.py
src/synthorg/research/retrieval/sources/academic.py
src/synthorg/persistence/sqlite/backend.py
src/synthorg/settings/enums.py
tests/unit/meta/mcp/test_all_handlers_wired.py
src/synthorg/persistence/sqlite/_backend_accessors.py
src/synthorg/meta/mcp/domains/__init__.py
scripts/check_provider_complete_chokepoint.py
src/synthorg/research/planning/__init__.py
tests/unit/research/test_planning.py
src/synthorg/research/constants.py
src/synthorg/persistence/research_protocol.py
src/synthorg/research/retrieval/protocol.py
tests/unit/api/fakes_backend.py
src/synthorg/security/action_types.py
tests/unit/research/test_triage.py
src/synthorg/observability/prometheus_labels.py
src/synthorg/research/planning/protocol.py
src/synthorg/research/retrieval/replay.py
tests/unit/core/test_enums.py
src/synthorg/research/retrieval/sources/_shared.py
src/synthorg/persistence/postgres/backend.py
web/src/api/types/error-codes.gen.ts
tests/unit/research/test_synthesis.py
src/synthorg/core/error_taxonomy.py
src/synthorg/research/retrieval/sources/web.py
src/synthorg/research/triage/heuristic.py
src/synthorg/research/synthesis/protocol.py
src/synthorg/security/timeout/risk_tier_classifier.py
src/synthorg/research/triage/protocol.py
docs/design/research-mode.md
tests/unit/research/test_tool.py
src/synthorg/research/retrieval/sources/code.py
src/synthorg/research/synthesis/citation_binder.py
src/synthorg/meta/mcp/domains/_research_args.py
src/synthorg/research/_args.py
src/synthorg/research/config.py
src/synthorg/research/errors.py
src/synthorg/research/planning/llm_planner.py
tests/unit/research/test_research_service.py
src/synthorg/meta/mcp/handlers/research.py
tests/evals_spine/test_research_eval.py
src/synthorg/research/tool_factory.py
src/synthorg/core/enums.py
src/synthorg/research/retrieval/sources/knowledge.py
evals/scoring/research.py
evals/models/brief.py
src/synthorg/research/retrieval/providers.py
src/synthorg/research/triage/hybrid.py
src/synthorg/persistence/postgres/research_run_repo.py
tests/unit/persistence/test_protocol.py
src/synthorg/security/rules/risk_classifier.py
src/synthorg/research/factory.py
tests/unit/research/test_research_models.py
src/synthorg/research/models.py
src/synthorg/research/tool.py
scripts/check_no_ghost_wiring.py
src/synthorg/research/synthesis/llm_synthesizer.py
tests/unit/meta/mcp/test_research_handlers.py
src/synthorg/research/triage/llm.py
src/synthorg/api/state.py
src/synthorg/persistence/sqlite/research_run_repo.py
src/synthorg/research/_llm.py
tests/unit/research/_fakes.py
tests/conformance/persistence/test_research_run_repository.py
src/synthorg/research/retrieval/dedup.py
src/synthorg/research/service.py
tests/unit/research/test_research_retrieval.py
src/synthorg/api/app.py

src/synthorg/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/synthorg/**/*.py: Only src/synthorg/persistence/ may import sqlite/psycopg or emit raw SQL; new repository protocols inherit from generic categories in persistence/_generics.py; bespoke methods permitted only under ADR-0001 D7
Configuration Precedence: DB > env > code default via SettingsService/ConfigResolver (Cat-1) or env > code default (Cat-2, read_only_post_init); Cat-3 bootstrap secrets pure env; YAML is ingestion format only, not precedence tier; no os.environ.get outside startup
No hardcoded numeric values; numerics live in settings/definitions/; allowlist only 0/1/-1, HTTP codes, hex masks, powers-of-2, and module-level annotated named constants (NAME: int|float|Final); enforced by scripts/check_no_magic_numbers.py
Comments document WHY only; no reviewer citations, issue back-refs, or migration framing; enforced by check_no_review_origin_in_code.py + check_no_migration_framing.py
No from __future__ import annotations (Python 3.14 has PEP 649); use PEP 758 except: except A, B: no parens unless binding
Type hints on public functions; mypy strict; Google-style docstrings; line length 88; functions <50 lines; files <800 lines
Errors follow <Domain><Condition>Error pattern from DomainError; never inherit Exception/RuntimeError directly; enforced by check_domain_error_hierarchy.py
Pydantic v2 frozen + extra="forbid" on every frozen model project-wide; gate check_frozen_model_extra_forbid.py; @computed_field auto-exempt; per-line # lint-allow: frozen-extra-forbid -- <reason> for extra="allow"/"ignore" boundaries; use @computed_field for derived; use NotBlankStr for identifiers
Args models at every system boundary; parse_typed() for every external dict ingestion; enforced by check_boundary_typed.py
Immutability: use model_copy(update=...) or copy.deepcopy(); deepcopy at system boundaries
Async: use asyncio.TaskGroup for fan-out/fan-in; helpers catch Exception (re-raise MemoryError/`RecursionError...

Files:

src/synthorg/research/synthesis/__init__.py
src/synthorg/settings/definitions/__init__.py
src/synthorg/research/triage/__init__.py
src/synthorg/settings/definitions/research.py
src/synthorg/meta/mcp/domains/research.py
src/synthorg/security/risk_scorer.py
src/synthorg/research/retrieval/__init__.py
src/synthorg/engine/prompt_safety.py
src/synthorg/observability/events/research.py
src/synthorg/research/retrieval/sources/__init__.py
src/synthorg/research/__init__.py
src/synthorg/observability/events/persistence.py
src/synthorg/persistence/protocol.py
src/synthorg/meta/mcp/handlers/__init__.py
src/synthorg/research/retrieval/sources/academic.py
src/synthorg/persistence/sqlite/backend.py
src/synthorg/settings/enums.py
src/synthorg/persistence/sqlite/_backend_accessors.py
src/synthorg/meta/mcp/domains/__init__.py
src/synthorg/research/planning/__init__.py
src/synthorg/research/constants.py
src/synthorg/persistence/research_protocol.py
src/synthorg/research/retrieval/protocol.py
src/synthorg/security/action_types.py
src/synthorg/observability/prometheus_labels.py
src/synthorg/research/planning/protocol.py
src/synthorg/research/retrieval/replay.py
src/synthorg/research/retrieval/sources/_shared.py
src/synthorg/persistence/postgres/backend.py
src/synthorg/core/error_taxonomy.py
src/synthorg/research/retrieval/sources/web.py
src/synthorg/research/triage/heuristic.py
src/synthorg/research/synthesis/protocol.py
src/synthorg/security/timeout/risk_tier_classifier.py
src/synthorg/research/triage/protocol.py
src/synthorg/research/retrieval/sources/code.py
src/synthorg/research/synthesis/citation_binder.py
src/synthorg/meta/mcp/domains/_research_args.py
src/synthorg/research/_args.py
src/synthorg/research/config.py
src/synthorg/research/errors.py
src/synthorg/research/planning/llm_planner.py
src/synthorg/meta/mcp/handlers/research.py
src/synthorg/research/tool_factory.py
src/synthorg/core/enums.py
src/synthorg/research/retrieval/sources/knowledge.py
src/synthorg/research/retrieval/providers.py
src/synthorg/research/triage/hybrid.py
src/synthorg/persistence/postgres/research_run_repo.py
src/synthorg/security/rules/risk_classifier.py
src/synthorg/research/factory.py
src/synthorg/research/models.py
src/synthorg/research/tool.py
src/synthorg/research/synthesis/llm_synthesizer.py
src/synthorg/research/triage/llm.py
src/synthorg/api/state.py
src/synthorg/persistence/sqlite/research_run_repo.py
src/synthorg/research/_llm.py
src/synthorg/research/retrieval/dedup.py
src/synthorg/research/service.py
src/synthorg/api/app.py

src/**/*.py

⚙️ CodeRabbit configuration file

This project uses Python 3.14+ with PEP 758 except syntax: "except A, B:" (comma-separated, no parentheses) is correct and mandatory -- do NOT flag it as a typo or suggest parenthesized form. The "except builtins.MemoryError, RecursionError: raise" pattern is intentional project convention for system-error propagation. When evaluating the 50-line function limit, count only the function body excluding the signature lines, decorators, and docstring. Functions 1-5 lines over due to docstrings or multi-line signatures should not be flagged. Do not suggest extracting single-use helper functions called exactly once -- this reduces readability without improving maintainability.

Files:

src/synthorg/research/synthesis/__init__.py
src/synthorg/settings/definitions/__init__.py
src/synthorg/research/triage/__init__.py
src/synthorg/settings/definitions/research.py
src/synthorg/meta/mcp/domains/research.py
src/synthorg/security/risk_scorer.py
src/synthorg/research/retrieval/__init__.py
src/synthorg/engine/prompt_safety.py
src/synthorg/observability/events/research.py
src/synthorg/research/retrieval/sources/__init__.py
src/synthorg/research/__init__.py
src/synthorg/observability/events/persistence.py
src/synthorg/persistence/protocol.py
src/synthorg/meta/mcp/handlers/__init__.py
src/synthorg/research/retrieval/sources/academic.py
src/synthorg/persistence/sqlite/backend.py
src/synthorg/settings/enums.py
src/synthorg/persistence/sqlite/_backend_accessors.py
src/synthorg/meta/mcp/domains/__init__.py
src/synthorg/research/planning/__init__.py
src/synthorg/research/constants.py
src/synthorg/persistence/research_protocol.py
src/synthorg/research/retrieval/protocol.py
src/synthorg/security/action_types.py
src/synthorg/observability/prometheus_labels.py
src/synthorg/research/planning/protocol.py
src/synthorg/research/retrieval/replay.py
src/synthorg/research/retrieval/sources/_shared.py
src/synthorg/persistence/postgres/backend.py
src/synthorg/core/error_taxonomy.py
src/synthorg/research/retrieval/sources/web.py
src/synthorg/research/triage/heuristic.py
src/synthorg/research/synthesis/protocol.py
src/synthorg/security/timeout/risk_tier_classifier.py
src/synthorg/research/triage/protocol.py
src/synthorg/research/retrieval/sources/code.py
src/synthorg/research/synthesis/citation_binder.py
src/synthorg/meta/mcp/domains/_research_args.py
src/synthorg/research/_args.py
src/synthorg/research/config.py
src/synthorg/research/errors.py
src/synthorg/research/planning/llm_planner.py
src/synthorg/meta/mcp/handlers/research.py
src/synthorg/research/tool_factory.py
src/synthorg/core/enums.py
src/synthorg/research/retrieval/sources/knowledge.py
src/synthorg/research/retrieval/providers.py
src/synthorg/research/triage/hybrid.py
src/synthorg/persistence/postgres/research_run_repo.py
src/synthorg/security/rules/risk_classifier.py
src/synthorg/research/factory.py
src/synthorg/research/models.py
src/synthorg/research/tool.py
src/synthorg/research/synthesis/llm_synthesizer.py
src/synthorg/research/triage/llm.py
src/synthorg/api/state.py
src/synthorg/persistence/sqlite/research_run_repo.py
src/synthorg/research/_llm.py
src/synthorg/research/retrieval/dedup.py
src/synthorg/research/service.py
src/synthorg/api/app.py

tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Timeout/slow failures = source-code regression; never edit tests/baselines/unit_timing.json or any scripts/*_baseline.{txt,json} / scripts/_*_baseline.py; both families PreToolUse-blocked; per-invocation bypass requires explicit approval (ALLOW_BASELINE_GROWTH=1 git commit)
Test markers: @pytest.mark.{unit,integration,e2e,slow}; async auto; timeout 30s global; coverage 80% min
xdist -n 8 --dist=loadfile auto-applied via pyproject addopts; Windows unit tests use WindowsSelectorEventLoopPolicy; subprocess tests override back
Test doubles: FakeClock for Clock seam, mock_of[T](**overrides) for typed-boundary substitutions, SimpleNamespace for attribute-bags; bare MagicMock at typed boundary blocked by scripts/check_mock_spec.py
Hypothesis: 10 deterministic CI examples; failures are real bugs (fix + add @example(...)); never skip/xfail flaky tests; fix fundamentally

Files:

tests/evals/prompt/test_agent_system_prompt.py
tests/unit/observability/test_events.py
tests/unit/security/test_action_types.py
tests/unit/meta/mcp/test_all_handlers_wired.py
tests/unit/research/test_planning.py
tests/unit/api/fakes_backend.py
tests/unit/research/test_triage.py
tests/unit/core/test_enums.py
tests/unit/research/test_synthesis.py
tests/unit/research/test_tool.py
tests/unit/research/test_research_service.py
tests/evals_spine/test_research_eval.py
tests/unit/persistence/test_protocol.py
tests/unit/research/test_research_models.py
tests/unit/meta/mcp/test_research_handlers.py
tests/unit/research/_fakes.py
tests/conformance/persistence/test_research_run_repository.py
tests/unit/research/test_research_retrieval.py

⚙️ CodeRabbit configuration file

Test files do not require Google-style docstrings on classes or functions -- ruff D rules are only enforced on src/. A bare @settings() decorator with no arguments on Hypothesis property tests is a no-op and should not be suggested -- the HYPOTHESIS_PROFILE env var controls example counts via registered profiles, which @given() honors automatically.

Files:

tests/evals/prompt/test_agent_system_prompt.py
tests/unit/observability/test_events.py
tests/unit/security/test_action_types.py
tests/unit/meta/mcp/test_all_handlers_wired.py
tests/unit/research/test_planning.py
tests/unit/api/fakes_backend.py
tests/unit/research/test_triage.py
tests/unit/core/test_enums.py
tests/unit/research/test_synthesis.py
tests/unit/research/test_tool.py
tests/unit/research/test_research_service.py
tests/evals_spine/test_research_eval.py
tests/unit/persistence/test_protocol.py
tests/unit/research/test_research_models.py
tests/unit/meta/mcp/test_research_handlers.py
tests/unit/research/_fakes.py
tests/conformance/persistence/test_research_run_repository.py
tests/unit/research/test_research_retrieval.py

src/synthorg/meta/mcp/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

MCP: Define ToolHandler + args_model; call require_admin_guardrails() on admin tools; route through service layers per mcp-handler-contract.md

Files:

src/synthorg/meta/mcp/domains/research.py
src/synthorg/meta/mcp/handlers/__init__.py
src/synthorg/meta/mcp/domains/__init__.py
src/synthorg/meta/mcp/domains/_research_args.py
src/synthorg/meta/mcp/handlers/research.py

web/src/**/*.{js,jsx,ts,tsx,mts}

📄 CodeRabbit inference engine (web/CLAUDE.md)

web/src/**/*.{js,jsx,ts,tsx,mts}: Always use createLogger from @/lib/logger; never bare console.warn/console.error/console.debug in application code. Variable name must always be log. Only logger.ts itself may use bare console methods. Use log.debug() (DEV-only, stripped in production), log.warn(), log.error().
Pass dynamic/untrusted values as separate args to logger calls (not interpolated into the message string) so they go through sanitizeArg
Attacker-controlled fields inside structured objects must be wrapped in sanitizeForLog() before embedding in log calls
Error-code constants (MANDATORY): import ErrorCode and ErrorCategory from @/api/types/errors (re-exported from the generated web/src/api/types/error-codes.gen.ts). Discriminate on ErrorCode.<NAME>, never on raw integer literals.
Use @eslint-react/web-api-no-leaked-fetch to detect fetch() in effects without AbortController cleanup

Files:

web/src/api/types/enum-values.gen.ts
web/src/api/types/openapi.gen.ts
web/src/api/types/error-codes.gen.ts

web/src/api/types/**/*.gen.ts

📄 CodeRabbit inference engine (web/CLAUDE.md)

Generated DTO types (MANDATORY): NEVER hand-edit web/src/api/types/*.gen.ts. Regenerate with uv run python scripts/generate_dto_types_ts.py. Import DTOs via the barrel (import type { AgentConfig } from '@/api/types').

Files:

web/src/api/types/enum-values.gen.ts
web/src/api/types/openapi.gen.ts
web/src/api/types/error-codes.gen.ts

web/src/**/*.{ts,tsx,mts}

📄 CodeRabbit inference engine (web/CLAUDE.md)

web/src/**/*.{ts,tsx,mts}: Use @typescript-eslint/no-floating-promises to forbid unawaited promises so async work cannot survive the test that scheduled it and trip the active-handle gate
Use @typescript-eslint/no-misused-promises (with checksVoidReturn: { attributes: false }) to forbid passing async functions where the callsite ignores the returned promise. React 19 async event handlers stay allowed via the attributes: false exemption.

Files:

web/src/api/types/enum-values.gen.ts
web/src/api/types/openapi.gen.ts
web/src/api/types/error-codes.gen.ts

src/synthorg/persistence/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/synthorg/persistence/**/*.py: Repository CRUD: save(entity), get(id), delete(id) -> bool, list_items(...), query(...) returning tuples
Datetime in persistence: use parse_iso_utc / format_iso_utc from persistence._shared (reject naive); use normalize_utc for already-typed

Files:

src/synthorg/persistence/protocol.py
src/synthorg/persistence/sqlite/backend.py
src/synthorg/persistence/sqlite/_backend_accessors.py
src/synthorg/persistence/research_protocol.py
src/synthorg/persistence/postgres/backend.py
src/synthorg/persistence/postgres/research_run_repo.py
src/synthorg/persistence/sqlite/research_run_repo.py

scripts/check_*.{py,sh}

📄 CodeRabbit inference engine (CLAUDE.md)

Every convention PR ships its enforcement gate per docs/reference/convention-gates.md

Files:

scripts/check_provider_complete_chokepoint.py
scripts/check_no_ghost_wiring.py

**/*.md

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.md: Numerics in README + public docs sourced from data/runtime_stats.yaml via  markers per data/README.md
Use d2 for architecture / nested containers; mermaid for flowcharts / sequence / pipelines; Markdown tables for tabular data; D2 theme 200 (Dark Mauve); D2 CLI pinned to v0.7.1 in CI

Files:

docs/design/research-mode.md

tests/conformance/persistence/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Dual-backend conformance: tests/conformance/persistence/ consumes backend fixture (SQLite + Postgres); enforced by check_dual_backend_test_parity.py

Files:

tests/conformance/persistence/test_research_run_repository.py

🧠 Learnings (9)

📚 Learning: 2026-05-05T09:04:46.195Z

Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1760
File: scripts/_dual_backend_parity_lib.py:215-216
Timestamp: 2026-05-05T09:04:46.195Z
Learning: This repository targets Python 3.14+ and follows PEP 758. Therefore, reviewer tooling should NOT treat unparenthesized multi-exception `except` clauses written without an `as` clause (e.g., `except MemoryError, RecursionError:`) as syntax errors. Only flag `except`-clause problems when they are genuinely invalid for Python 3.14+.

Applied to files:

src/synthorg/research/synthesis/__init__.py
tests/evals/prompt/test_agent_system_prompt.py
src/synthorg/settings/definitions/__init__.py
src/synthorg/research/triage/__init__.py
tests/unit/observability/test_events.py
src/synthorg/settings/definitions/research.py
src/synthorg/meta/mcp/domains/research.py
src/synthorg/security/risk_scorer.py
src/synthorg/research/retrieval/__init__.py
src/synthorg/engine/prompt_safety.py
src/synthorg/observability/events/research.py
src/synthorg/research/retrieval/sources/__init__.py
src/synthorg/research/__init__.py
src/synthorg/observability/events/persistence.py
tests/unit/security/test_action_types.py
src/synthorg/persistence/protocol.py
src/synthorg/meta/mcp/handlers/__init__.py
src/synthorg/research/retrieval/sources/academic.py
src/synthorg/persistence/sqlite/backend.py
src/synthorg/settings/enums.py
tests/unit/meta/mcp/test_all_handlers_wired.py
src/synthorg/persistence/sqlite/_backend_accessors.py
src/synthorg/meta/mcp/domains/__init__.py
scripts/check_provider_complete_chokepoint.py
src/synthorg/research/planning/__init__.py
tests/unit/research/test_planning.py
src/synthorg/research/constants.py
src/synthorg/persistence/research_protocol.py
src/synthorg/research/retrieval/protocol.py
tests/unit/api/fakes_backend.py
src/synthorg/security/action_types.py
tests/unit/research/test_triage.py
src/synthorg/observability/prometheus_labels.py
src/synthorg/research/planning/protocol.py
src/synthorg/research/retrieval/replay.py
tests/unit/core/test_enums.py
src/synthorg/research/retrieval/sources/_shared.py
src/synthorg/persistence/postgres/backend.py
tests/unit/research/test_synthesis.py
src/synthorg/core/error_taxonomy.py
src/synthorg/research/retrieval/sources/web.py
src/synthorg/research/triage/heuristic.py
src/synthorg/research/synthesis/protocol.py
src/synthorg/security/timeout/risk_tier_classifier.py
src/synthorg/research/triage/protocol.py
tests/unit/research/test_tool.py
src/synthorg/research/retrieval/sources/code.py
src/synthorg/research/synthesis/citation_binder.py
src/synthorg/meta/mcp/domains/_research_args.py
src/synthorg/research/_args.py
src/synthorg/research/config.py
src/synthorg/research/errors.py
src/synthorg/research/planning/llm_planner.py
tests/unit/research/test_research_service.py
src/synthorg/meta/mcp/handlers/research.py
tests/evals_spine/test_research_eval.py
src/synthorg/research/tool_factory.py
src/synthorg/core/enums.py
src/synthorg/research/retrieval/sources/knowledge.py
evals/scoring/research.py
evals/models/brief.py
src/synthorg/research/retrieval/providers.py
src/synthorg/research/triage/hybrid.py
src/synthorg/persistence/postgres/research_run_repo.py
tests/unit/persistence/test_protocol.py
src/synthorg/security/rules/risk_classifier.py
src/synthorg/research/factory.py
tests/unit/research/test_research_models.py
src/synthorg/research/models.py
src/synthorg/research/tool.py
scripts/check_no_ghost_wiring.py
src/synthorg/research/synthesis/llm_synthesizer.py
tests/unit/meta/mcp/test_research_handlers.py
src/synthorg/research/triage/llm.py
src/synthorg/api/state.py
src/synthorg/persistence/sqlite/research_run_repo.py
src/synthorg/research/_llm.py
tests/unit/research/_fakes.py
tests/conformance/persistence/test_research_run_repository.py
src/synthorg/research/retrieval/dedup.py
src/synthorg/research/service.py
tests/unit/research/test_research_retrieval.py
src/synthorg/api/app.py

📚 Learning: 2026-05-21T22:55:20.496Z

Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 2035
File: src/synthorg/meta/toolsmith/models.py:114-114
Timestamp: 2026-05-21T22:55:20.496Z
Learning: In this repo’s “magic number” review standard, the existing gate in `scripts/check_no_magic_numbers.py` intentionally does NOT flag numeric literals used as raw call-site arguments. So, do not flag numeric literals passed as keyword arguments to Pydantic `Field()` (e.g., `Field(ge=0, le=100)` / `Field(ge=1, le=50)`)—this is an established idiom. Only treat numeric literals as “magic numbers” when they occur in the locations the gate checks (module-level assignments and function/method parameter defaults).

Applied to files:

src/synthorg/research/synthesis/__init__.py
tests/evals/prompt/test_agent_system_prompt.py
src/synthorg/settings/definitions/__init__.py
src/synthorg/research/triage/__init__.py
tests/unit/observability/test_events.py
src/synthorg/settings/definitions/research.py
src/synthorg/meta/mcp/domains/research.py
src/synthorg/security/risk_scorer.py
src/synthorg/research/retrieval/__init__.py
src/synthorg/engine/prompt_safety.py
src/synthorg/observability/events/research.py
src/synthorg/research/retrieval/sources/__init__.py
src/synthorg/research/__init__.py
src/synthorg/observability/events/persistence.py
tests/unit/security/test_action_types.py
src/synthorg/persistence/protocol.py
src/synthorg/meta/mcp/handlers/__init__.py
src/synthorg/research/retrieval/sources/academic.py
src/synthorg/persistence/sqlite/backend.py
src/synthorg/settings/enums.py
tests/unit/meta/mcp/test_all_handlers_wired.py
src/synthorg/persistence/sqlite/_backend_accessors.py
src/synthorg/meta/mcp/domains/__init__.py
scripts/check_provider_complete_chokepoint.py
src/synthorg/research/planning/__init__.py
tests/unit/research/test_planning.py
src/synthorg/research/constants.py
src/synthorg/persistence/research_protocol.py
src/synthorg/research/retrieval/protocol.py
tests/unit/api/fakes_backend.py
src/synthorg/security/action_types.py
tests/unit/research/test_triage.py
src/synthorg/observability/prometheus_labels.py
src/synthorg/research/planning/protocol.py
src/synthorg/research/retrieval/replay.py
tests/unit/core/test_enums.py
src/synthorg/research/retrieval/sources/_shared.py
src/synthorg/persistence/postgres/backend.py
tests/unit/research/test_synthesis.py
src/synthorg/core/error_taxonomy.py
src/synthorg/research/retrieval/sources/web.py
src/synthorg/research/triage/heuristic.py
src/synthorg/research/synthesis/protocol.py
src/synthorg/security/timeout/risk_tier_classifier.py
src/synthorg/research/triage/protocol.py
tests/unit/research/test_tool.py
src/synthorg/research/retrieval/sources/code.py
src/synthorg/research/synthesis/citation_binder.py
src/synthorg/meta/mcp/domains/_research_args.py
src/synthorg/research/_args.py
src/synthorg/research/config.py
src/synthorg/research/errors.py
src/synthorg/research/planning/llm_planner.py
tests/unit/research/test_research_service.py
src/synthorg/meta/mcp/handlers/research.py
tests/evals_spine/test_research_eval.py
src/synthorg/research/tool_factory.py
src/synthorg/core/enums.py
src/synthorg/research/retrieval/sources/knowledge.py
evals/scoring/research.py
evals/models/brief.py
src/synthorg/research/retrieval/providers.py
src/synthorg/research/triage/hybrid.py
src/synthorg/persistence/postgres/research_run_repo.py
tests/unit/persistence/test_protocol.py
src/synthorg/security/rules/risk_classifier.py
src/synthorg/research/factory.py
tests/unit/research/test_research_models.py
src/synthorg/research/models.py
src/synthorg/research/tool.py
scripts/check_no_ghost_wiring.py
src/synthorg/research/synthesis/llm_synthesizer.py
tests/unit/meta/mcp/test_research_handlers.py
src/synthorg/research/triage/llm.py
src/synthorg/api/state.py
src/synthorg/persistence/sqlite/research_run_repo.py
src/synthorg/research/_llm.py
tests/unit/research/_fakes.py
tests/conformance/persistence/test_research_run_repository.py
src/synthorg/research/retrieval/dedup.py
src/synthorg/research/service.py
tests/unit/research/test_research_retrieval.py
src/synthorg/api/app.py

📚 Learning: 2026-05-21T22:55:09.289Z

Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 2035
File: src/synthorg/meta/toolsmith/config.py:29-30
Timestamp: 2026-05-21T22:55:09.289Z
Learning: For this repo’s Pydantic configuration idiom, do not treat numeric literals passed directly as arguments to `pydantic.Field(...)` as “magic numbers” during review. This includes call-site usages like `Field(default=0.2, ge=0.0, le=1.0)` (e.g., in config models such as `ToolAuthoringConfig`, `ToolValidationConfig`, `ToolsmithConfig`). Do not request extracting those `Field(...)` numeric arguments into named constants, since the repo’s `scripts/check_no_magic_numbers.py` intentionally excludes call-site `Field(...)` numerics and relies on `Field(...)` as the canonical way to express these constraints/defaults.

Applied to files:

src/synthorg/research/synthesis/__init__.py
src/synthorg/settings/definitions/__init__.py
src/synthorg/research/triage/__init__.py
src/synthorg/settings/definitions/research.py
src/synthorg/meta/mcp/domains/research.py
src/synthorg/security/risk_scorer.py
src/synthorg/research/retrieval/__init__.py
src/synthorg/engine/prompt_safety.py
src/synthorg/observability/events/research.py
src/synthorg/research/retrieval/sources/__init__.py
src/synthorg/research/__init__.py
src/synthorg/observability/events/persistence.py
src/synthorg/persistence/protocol.py
src/synthorg/meta/mcp/handlers/__init__.py
src/synthorg/research/retrieval/sources/academic.py
src/synthorg/persistence/sqlite/backend.py
src/synthorg/settings/enums.py
src/synthorg/persistence/sqlite/_backend_accessors.py
src/synthorg/meta/mcp/domains/__init__.py
src/synthorg/research/planning/__init__.py
src/synthorg/research/constants.py
src/synthorg/persistence/research_protocol.py
src/synthorg/research/retrieval/protocol.py
src/synthorg/security/action_types.py
src/synthorg/observability/prometheus_labels.py
src/synthorg/research/planning/protocol.py
src/synthorg/research/retrieval/replay.py
src/synthorg/research/retrieval/sources/_shared.py
src/synthorg/persistence/postgres/backend.py
src/synthorg/core/error_taxonomy.py
src/synthorg/research/retrieval/sources/web.py
src/synthorg/research/triage/heuristic.py
src/synthorg/research/synthesis/protocol.py
src/synthorg/security/timeout/risk_tier_classifier.py
src/synthorg/research/triage/protocol.py
src/synthorg/research/retrieval/sources/code.py
src/synthorg/research/synthesis/citation_binder.py
src/synthorg/meta/mcp/domains/_research_args.py
src/synthorg/research/_args.py
src/synthorg/research/config.py
src/synthorg/research/errors.py
src/synthorg/research/planning/llm_planner.py
src/synthorg/meta/mcp/handlers/research.py
src/synthorg/research/tool_factory.py
src/synthorg/core/enums.py
src/synthorg/research/retrieval/sources/knowledge.py
src/synthorg/research/retrieval/providers.py
src/synthorg/research/triage/hybrid.py
src/synthorg/persistence/postgres/research_run_repo.py
src/synthorg/security/rules/risk_classifier.py
src/synthorg/research/factory.py
src/synthorg/research/models.py
src/synthorg/research/tool.py
src/synthorg/research/synthesis/llm_synthesizer.py
src/synthorg/research/triage/llm.py
src/synthorg/api/state.py
src/synthorg/persistence/sqlite/research_run_repo.py
src/synthorg/research/_llm.py
src/synthorg/research/retrieval/dedup.py
src/synthorg/research/service.py
src/synthorg/api/app.py

📚 Learning: 2026-05-17T11:45:11.839Z

Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1952
File: src/synthorg/settings/definitions/api.py:594-638
Timestamp: 2026-05-17T11:45:11.839Z
Learning: In SynthOrg (Aureliolo/synthorg) pre-alpha, apply the strict no-backward-compat policy: any setting-key rename must be fully completed in the same change/PR with all repo callers updated, and you should not keep legacy aliases or compatibility fallbacks. When reviewing, do not flag a setting-key rename as a breaking upgrade hazard if the rename is repo-wide and fully implemented within the same PR.

Applied to files:

src/synthorg/settings/definitions/__init__.py
src/synthorg/settings/definitions/research.py
src/synthorg/settings/enums.py

📚 Learning: 2026-05-17T11:45:11.839Z

Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1952
File: src/synthorg/settings/definitions/api.py:594-638
Timestamp: 2026-05-17T11:45:11.839Z
Learning: In this repository, SynthOrg is pre-alpha and uses a strict no-backward-compat policy for setting-key renames. When reviewing code under src/synthorg/settings, do NOT flag a setting-key rename as an “upgrade-safety” issue if the rename is complete/atomic in the same PR: all callers/usages of the old key are updated simultaneously, and the PR does not keep any legacy aliases, compatibility fallbacks, or migration/rollback paths for the old key.

Applied to files:

src/synthorg/settings/definitions/__init__.py
src/synthorg/settings/definitions/research.py
src/synthorg/settings/enums.py

📚 Learning: 2026-05-16T18:36:31.446Z

Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1944
File: docs/reference/conventions.md:787-789
Timestamp: 2026-05-16T18:36:31.446Z
Learning: In Aureliolo/synthorg, do not require adding `<!--RS:...-->` “Doc Numeric Claims (MANDATORY)” numeric macros for Python version numbers mentioned in documentation prose (e.g., “Python 3.14”, “Python 3.15”). The `scripts/check_doc_numeric_macros.py` gate only applies to `README.md`, `docs/index.md`, `docs/roadmap/index.md`, `docs/architecture/decisions.md`, and `docs/reference/convention-gates.md`, and it only flags digits adjacent to specific stat nouns (tests/providers/agents/stars/releases), not language version mentions like “Python 3.14”.

Applied to files:

docs/design/research-mode.md

📚 Learning: 2026-05-16T18:36:35.250Z

Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1944
File: docs/getting_started.md:109-109
Timestamp: 2026-05-16T18:36:35.250Z
Learning: When reviewing Markdown in the synthorg repo, account for the CI gate `check_doc_numeric_macros.py`: it skips fenced code blocks entirely, and it only flags digits that are adjacent to these stat nouns: `tests`, `providers`, `agents`, `stars`, `releases`. Therefore, numeric examples such as CLI flag values (e.g., `--num-workers=4` in fenced bash blocks) and prose version numbers (e.g., `3.14`/`3.15`) are not expected to trigger this check; prioritize changes only when digits appear next to one of the listed nouns (e.g., “5 tests”, “10 stars”, etc.).

Applied to files:

docs/design/research-mode.md

📚 Learning: 2026-05-16T18:36:35.250Z

Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1944
File: docs/getting_started.md:109-109
Timestamp: 2026-05-16T18:36:35.250Z
Learning: When reviewing markdown files for the "Doc Numeric Claims (MANDATORY)" RS-marker rule, only require/flag missing RS markers in the files that are actually in-scope for the rule. The scope is enforced via an identical _SCOPED_FILES allowlist in scripts/check_doc_numeric_macros.py and scripts/inject_runtime_stats.py, and currently includes: README.md; docs/index.md; docs/roadmap/index.md; docs/architecture/decisions.md; docs/reference/convention-gates.md. For any other markdown files (e.g., docs/getting_started.md, docs/guides/*), missing RS markers for numeric claims are no-ops and should NOT be flagged.

Applied to files:

docs/design/research-mode.md

📚 Learning: 2026-05-16T18:36:35.250Z

Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1944
File: docs/getting_started.md:109-109
Timestamp: 2026-05-16T18:36:35.250Z
Learning: When reviewing Markdown in the synthorg repo against the `check_doc_numeric_macros.py` gate, account for its documented behavior: it skips fenced code blocks entirely, and it only flags digits that are adjacent to specific stat nouns (`tests`, `providers`, `agents`, `stars`, `releases`). As a result, CLI-style numbers (e.g., `--num-workers=4`) inside fenced bash code blocks should never be treated as violations of this gate; only non-fenced text needs checking, and only around those specific nouns.

Applied to files:

docs/design/research-mode.md

Reviewer fixes (CodeRabbit + Gemini): - research/service.py: enforce max_cost/max_wall_clock_seconds budgets (asyncio.timeout plus per-stage cost checks) via new ResearchBudgetExceededError; surface MemoryError/RecursionError out of TaskGroup BaseExceptionGroup - _llm.py: balanced JSON-object extraction via JSONDecoder.raw_decode scan - models.py: FAILED runs now require completed_at as well as error - retrieval/dedup.py: canonicalise only host case, preserve path case - retrieval/sources/{academic,code}.py: skip blank-URI rows, not whole call - synthesis/llm_synthesizer.py: wrap model-produced research_angle untrusted - tool.py: repr(min_credibility) in run_id key to avoid lossy collapse - triage/llm.py: validate batch_size at least one - settings/definitions/research.py: mark master 'enabled' restart_required - mcp/domains/research.py: derive _STATUSES from ResearchRunStatus enum - mcp/handlers/research.py: created_at via app_state Clock seam - api/app.py: wire research from live AppState settings, not captured arg - evals/scoring/research.py: Unicode-aware tokeniser - tests: real ReplayRetrievalSource assertions; protocol-shaped research_runs fake CI fixes: - web settings Records and NAMESPACE_ORDER gain 'research' (TS build/lighthouse) - tool-count pin 216 becomes 219 (research adds run/get/list) - starlette 1.0.0 to 1.0.1 (PYSEC-2026-161) - regenerated error-codes/openapi types for RESEARCH_BUDGET_EXCEEDED Skipped: CR positional_relevance one-based fix (callers pass zero-based enumerate; current code already scores first result 1.0).

Pre-push mypy caught dict-invariance: build_replay_sources returns ReplayRetrievalSource values which don't fit _build_service's dict[ResearchSourceType, RetrievalSource] parameter. Add the wider annotation explicitly, matching what test_research_eval.py already does.

codecov · 2026-05-22T17:40:46Z

Codecov Report

❌ Patch coverage is 89.58009% with 134 lines in your changes missing coverage. Please review.
✅ Project coverage is 85.03%. Comparing base (7527078) to head (3ffd6c8).
✅ All tests successful. No failed tests found.

Files with missing lines	Patch %	Lines
...synthorg/persistence/postgres/research_run_repo.py	70.27%	22 Missing ⚠️
...c/synthorg/persistence/sqlite/research_run_repo.py	69.86%	22 Missing ⚠️
src/synthorg/api/app.py	42.85%	17 Missing and 3 partials ⚠️
src/synthorg/research/factory.py	58.69%	11 Missing and 8 partials ⚠️
src/synthorg/research/service.py	90.12%	8 Missing ⚠️
src/synthorg/research/triage/heuristic.py	82.97%	6 Missing and 2 partials ⚠️
src/synthorg/meta/mcp/handlers/research.py	93.54%	5 Missing and 1 partial ⚠️
src/synthorg/research/models.py	96.87%	4 Missing and 2 partials ⚠️
src/synthorg/research/retrieval/sources/_shared.py	66.66%	2 Missing and 2 partials ⚠️
src/synthorg/research/tool_factory.py	66.66%	4 Missing ⚠️
... and 8 more

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2041      +/-   ##
==========================================
+ Coverage   84.98%   85.03%   +0.04%     
==========================================
  Files        2157     2193      +36     
  Lines      126065   127351    +1286     
  Branches    10530    10579      +49     
==========================================
+ Hits       107142   108298    +1156     
- Misses      16281    16391     +110     
- Partials     2642     2662      +20

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

CodeRabbit's CHANGES_REQUESTED was for commit f491a9c; all 18 actionable findings have been addressed in 158c839 (one factually-wrong positional_relevance skip logged with disproof). CodeRabbit's rolling summary on the new head 3ffd6c8 confirms 'No actionable comments were generated in the recent review.' Dismissing the stale review.

## Highlights > _AI-generated summary (model: `openai/gpt-4.1-mini` via GitHub Models). Commit-based changelog below._ ### What you'll notice - New brownfield codebase intake mode supports merger and acquisition scenarios. - Added deep CEO interview feature to improve project charter creation. - Introduced mission control and flight recorder operator cockpit for better operational oversight. - Research mode added for enhanced exploratory work. - Runtime services now log safety-spine state at boot for clearer diagnostics. ### What's new - Research mode feature enables deeper data exploration. - CEO interview integration helps shape project charters. - Mission control and flight recorder cockpit introduced for operational tracking. ### Under the hood - Improved codebase modularity with module-size gates and lint tightening. - Added __init__.py to 21 test directories for better test discovery. - Promoted six transitive dependencies to direct dependencies for clarity. - Split codespell ignore list into vocabulary and source renames. - Decomposed oversized web utilities, hooks, and libraries for maintainability. - Enhanced CI with Lychee link checker integration and retry logic for cosign signing. - Sharded unit and integration tests and added Postgres service container in CI. - Updated infrastructure and web dependencies; maintained lock files.  :robot: I have created a release *beep* *boop* --- ## [0.8.8](v0.8.7...v0.8.8) (2026-05-24) ### Features * brownfield codebase intake (merger/acquisition entry mode) ([#2042](#2042)) ([e287621](e287621)), closes [#1975](#1975) * deep CEO interview to project charter ([#2045](#2045)) ([904f2fb](904f2fb)) * mission control + flight recorder operator cockpit ([#2044](#2044)) ([1c2660b](1c2660b)) * research mode ([#2041](#2041)) ([f81a5ac](f81a5ac)), closes [#1989](#1989) * surface safety-spine state in runtime-services boot log (closes [#2096](#2096)) ([#2097](#2097)) ([f187b31](f187b31)) ### Refactoring * add __init__.py to 21 leaf test directories (INP001) ([#2081](#2081)) ([2592118](2592118)), closes [#2064](#2064) * codebase modularity (1/4) - module-size gates + lint tightening + tools ([#2078](#2078)) ([556fbd9](556fbd9)), closes [#2047](#2047) [#2040](#2040) * promote 6 transitive deps to direct deps ([#2083](#2083)) ([adedc6a](adedc6a)) * split codespell ignore-words-list into vocab + source renames ([#2085](#2085)) ([917d98a](917d98a)), closes [#2074](#2074) * **web:** PR A foundation, decompose oversized utils/hooks/lib ([#2092](#2092)) ([#2098](#2098)) ([aedbba5](aedbba5)) ### CI/CD * exclude slsa.dev from lychee (transient timeout on canonical badge) ([#2090](#2090)) ([346c51d](346c51d)) * fix paths-filter shallow-clone race and scorecard allowlist ([#2089](#2089)) ([7cd7ce8](7cd7ce8)) * refresh .test_durations.{unit,integration} ([#2087](#2087)) ([ddf2d86](ddf2d86)) * retry cosign sign on transient GHCR/Rekor failures ([#2100](#2100)) ([da9422a](da9422a)) * shard test-unit + test-integration, sysmon coverage, Postgres service container ([#2080](#2080)) ([0768787](0768787)) * wire Lychee link-checker (workflow + installer + pre-push hook) ([#2084](#2084)) ([1c0694a](1c0694a)) ### Maintenance * Lock file maintenance ([#2086](#2086)) ([a78810a](a78810a)) * Update Infrastructure dependencies ([#2055](#2055)) ([041ad8b](041ad8b)) * Update Web dependencies ([#2054](#2054)) ([4d57b9a](4d57b9a)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). --------- Co-authored-by: synthorg-repo-bot[bot] <279117679+synthorg-repo-bot[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

Aureliolo added 7 commits May 22, 2026 15:38

feat: research mode foundations, models, and pipeline strategies

fc95717

feat: research run service orchestrator and dual-backend persistence

8c2b61e

feat: wire research mode (tool, MCP domain, settings, app startup)

eb22d80

feat: research eval lane, deterministic grader, and design doc

7b445ef

fix: update spine inventories and route URL normalisation through cor…

7421a5c

…e helper

fix: currency-neutral cost fields, annotate constants, allowlist rese…

3bbc6ea

…arch LLM helper, unique test basenames

fix: wrap untrusted source metadata in research prompts for SEC-1 har…

f491a9c

…dening

Aureliolo temporarily deployed to lighthouse May 22, 2026 14:58 — with GitHub Actions Inactive

Aureliolo had a problem deploying to lighthouse May 22, 2026 14:58 — with GitHub Actions Failure

Aureliolo temporarily deployed to cloudflare-preview May 22, 2026 15:00 — with GitHub Actions Inactive

gemini-code-assist Bot reviewed May 22, 2026

View reviewed changes

coderabbitai Bot previously requested changes May 22, 2026

View reviewed changes

Aureliolo added 2 commits May 22, 2026 18:59

Aureliolo temporarily deployed to lighthouse May 22, 2026 17:26 — with GitHub Actions Inactive

Aureliolo temporarily deployed to cloudflare-preview May 22, 2026 17:27 — with GitHub Actions Inactive

Aureliolo merged commit f81a5ac into main May 22, 2026
83 checks passed

Aureliolo deleted the feat/1989-research-mode branch May 22, 2026 17:59

Aureliolo temporarily deployed to cloudflare-preview May 22, 2026 17:59 — with GitHub Actions Inactive

synthorg-repo-bot Bot mentioned this pull request May 22, 2026

chore(main): release 0.8.8 #2043

Merged

Conversation

Aureliolo commented May 22, 2026

Summary

What's included

Test plan

Review coverage

Uh oh!

github-actions Bot commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Dependency Review

License Issues

uv.lock

OpenSSF Scorecard

Scanned Files

Uh oh!

coderabbitai Bot commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Uh oh!

gemini-code-assist Bot commented May 22, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

codspeed-hq Bot commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merging this PR will not alter performance

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 22, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov Bot commented May 22, 2026

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

github-actions Bot commented May 22, 2026 •

edited

Loading

coderabbitai Bot commented May 22, 2026 •

edited

Loading

codspeed-hq Bot commented May 22, 2026 •

edited

Loading