feat: research mode#2041
Conversation
…arch LLM helper, unique test basenames
Dependency ReviewThe following issues were found:
License Issuesuv.lock
OpenSSF Scorecard
Scanned Files
|
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository UI (base), Organization UI (inherited) Review profile: ASSERTIVE Plan: Pro Run ID: ⛔ Files ignored due to path filters (1)
📒 Files selected for processing (27)
📜 Recent review details⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (16)
🧰 Additional context used📓 Path-based instructions (11)web/src/**/*.{js,jsx,ts,tsx,mts}📄 CodeRabbit inference engine (web/CLAUDE.md)
Files:
web/src/api/types/**/*.gen.ts📄 CodeRabbit inference engine (web/CLAUDE.md)
Files:
web/src/**/*.{ts,tsx,mts}📄 CodeRabbit inference engine (web/CLAUDE.md)
Files:
**/*.{py,ts,tsx,jsx,md}📄 CodeRabbit inference engine (CLAUDE.md)
Files:
web/src/utils/constants.ts📄 CodeRabbit inference engine (web/CLAUDE.md)
Files:
web/src/{components,utils}/**/*.{ts,tsx}📄 CodeRabbit inference engine (web/CLAUDE.md)
Files:
tests/**/*.py📄 CodeRabbit inference engine (CLAUDE.md)
Files:
⚙️ CodeRabbit configuration file
Files:
src/synthorg/**/*.py📄 CodeRabbit inference engine (CLAUDE.md)
Files:
src/synthorg/meta/mcp/**/*.py📄 CodeRabbit inference engine (CLAUDE.md)
Files:
src/**/*.py⚙️ CodeRabbit configuration file
Files:
tests/conformance/persistence/**/*.py📄 CodeRabbit inference engine (CLAUDE.md)
Files:
🧠 Learnings (5)📚 Learning: 2026-05-05T09:04:46.195ZApplied to files:
📚 Learning: 2026-05-21T22:55:20.496ZApplied to files:
📚 Learning: 2026-05-21T22:55:09.289ZApplied to files:
📚 Learning: 2026-05-17T11:45:11.839ZApplied to files:
📚 Learning: 2026-05-17T11:45:11.839ZApplied to files:
🔇 Additional comments (27)
WalkthroughAdds a deterministic “Research Mode” pipeline: models, planner, multi-source retrieval (knowledge/web/academic/code), credibility triage (heuristic/LLM/hybrid), deduplication, and LLM synthesis with citation binding. Wires app startup, settings, security/risk, observability, and MCP/agent tools. Implements Postgres/SQLite repositories and migrations for research_runs. Extends enums and error codes. Provides replay mechanics and deterministic grading. Includes comprehensive unit/integration/eval tests validating persistence, replay, MCP handlers, tools, and byte-stable outputs. |
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request implements a comprehensive research subsystem for synthetic organizations. It transforms research from a simple sandbox action into a robust, recorded, and replayable pipeline. By integrating internal knowledge with external search providers, the system provides citation-backed reports where every claim is verified against retrievable sources. The implementation includes full persistence, evaluation capabilities, and security measures for handling untrusted external content. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize the Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counterproductive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request implements "Research Mode," a new subsystem enabling synthetic organizations to execute complex research tasks through a pipeline of query planning, multi-source retrieval (internal knowledge, web, academic, and code), credibility triage, and citation-backed synthesis. The implementation includes the core ResearchService, Pydantic data models, persistence repositories for Postgres and SQLite, and integration with the agent tool layer and MCP. A review finding notes that while max_cost and max_wall_clock_seconds limits are defined in the research brief, they are not currently enforced by the orchestrator during execution, presenting a risk of runaway costs or latency.
| async def _execute( | ||
| self, | ||
| run: ResearchRun, | ||
| brief: ResearchBrief, | ||
| started_at: datetime, | ||
| ) -> ResearchRun: |
There was a problem hiding this comment.
The research pipeline does not enforce the max_cost or max_wall_clock_seconds limits defined in the ResearchBrief. While these values are recorded in the final ResearchRun, the orchestrator should actively monitor and enforce them during execution to prevent runaway costs or latency.
Consider wrapping the _execute body in an asyncio.timeout block and checking the accumulated total_cost after each LLM-backed stage (planning, triage, synthesis).
There was a problem hiding this comment.
Actionable comments posted: 18
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@evals/scoring/research.py`:
- Around line 27-47: The tokenizer is ASCII-only (_TOKEN_RE) and misses
non‑ASCII words; update tokenization to be Unicode-aware by changing _TOKEN_RE
to use a Unicode word pattern (e.g. re.compile(r"\w+", re.UNICODE)) and update
_tokens(text: str) to filter out non-alphanumeric-only matches (use
token.isalnum() or any(char.isalnum() for char in token)) before returning
frozenset so underscores are excluded but Unicode letters/numbers are included;
adjust references in any coverage logic (e.g. COVERAGE_TOKEN_OVERLAP) remain
unchanged.
In `@src/synthorg/api/app.py`:
- Around line 1472-1486: The guard is using the captured
settings_service/provider_registry args (from create_app) which are often None;
change the wiring to read the live AppState services instead. Replace uses of
the local settings_service and provider_registry in the research wiring block
with the app/state's runtime services (e.g., AppState.settings_service or
app.state.settings_service and the runtime provider_registry) before returning,
and then proceed to import and call
build_research_service/build_research_tool_factory as shown so research is wired
during the default boot path.
In `@src/synthorg/meta/mcp/domains/research.py`:
- Around line 22-30: Replace the hardcoded _STATUSES list with values derived
from the existing ResearchRunStatus enum so the MCP domain cannot drift;
specifically import or reference ResearchRunStatus and build _STATUSES =
[status.value for status in ResearchRunStatus] (or equivalent) and ensure any
validation in ResearchListArgs uses ResearchRunStatus directly (or the same
derived list) instead of repeating the string literals; update references to
_STATUSES and ResearchListArgs validation to use the single source of truth
ResearchRunStatus.
In `@src/synthorg/meta/mcp/handlers/research.py`:
- Line 90: Replace the inline SystemClock() usage with a clock seam: add an
optional parameter like clock: Clock | None = None to the handler/function that
sets created_at, and use (clock or SystemClock()).now() (or clock.now() when
clock is provided) to populate created_at instead of SystemClock().now(); update
the call sites in the same code path to pass a test-injectable FakeClock in
tests so created_at becomes deterministic.
In `@src/synthorg/research/_llm.py`:
- Around line 21-42: The current greedy regex _JSON_OBJECT_RE and
extract_json_object are capturing from the first "{" to the last "}" and can
return an invalid span when extra braces appear outside the JSON; replace the
regex-based approach in extract_json_object with a deterministic brace-matching
scan: locate the first "{" in content, iterate forward maintaining a nesting
counter (increment on "{", decrement on "}"), return the substring from the
first "{" to the position where the counter returns to zero, and raise
ValueError if the end of content is reached before the counter closes; refer to
extract_json_object and _JSON_OBJECT_RE while making this change and
remove/disable the greedy regex usage.
In `@src/synthorg/research/models.py`:
- Around line 501-503: The validator only enforces that a FAILED run has an
error but not that it has a completion timestamp; update the check on
self.status == ResearchRunStatus.FAILED (the block that currently raises
ValueError when self.error is None) to also require self.completed_at to be set
and raise a clear ValueError if either error or completed_at is missing so
FAILED terminal runs must include both self.error and self.completed_at.
In `@src/synthorg/research/retrieval/dedup.py`:
- Around line 38-43: The _canonical_url function currently lowercases the entire
URI via normalize_ascii_lowercase(uri), which can collapse case-sensitive paths;
instead parse the original uri with urlsplit(uri), lowercase only the scheme and
netloc (use normalize_ascii_lowercase on parts.netloc and parts.scheme if
needed) while preserving parts.path exactly, then build the canonical host+path
string as f"{lower_netloc}{parts.path}".rstrip("/") (dropping query/fragment as
before) and return that or the original input when empty; update references in
_canonical_url to use urlsplit on the raw uri and only lowercase the netloc (and
optionally scheme) rather than the full URI.
In `@src/synthorg/research/retrieval/sources/_shared.py`:
- Around line 36-45: positional_relevance currently treats position as
zero-based so the documented "first result scores 1.0" is wrong for one-based
ranks; update the function positional_relevance to treat position as a one-based
rank by computing relevance as max(0.0, (total - position + 1) / total) (and
keep the existing guard for total <= 0), and ensure the docstring matches the
one-based semantics and that out-of-range positions are clamped to the [0.0,1.0]
range.
In `@src/synthorg/research/retrieval/sources/academic.py`:
- Around line 57-75: Guard against empty/malformed URIs before constructing
NotBlankStr to avoid failing the whole retrieval: compute the raw uri from
result.url or result.identifier, check if it's truthy after strip (e.g., if not
uri_raw or not uri_raw.strip(): continue), and only then set uri =
uri_raw.strip() and proceed to build AcademicSourceLocator, RetrievedItem and
call items.append; reference the variables/functions NotBlankStr, RetrievedItem,
AcademicSourceLocator, uri and result.url/result.identifier to locate the
change.
In `@src/synthorg/research/retrieval/sources/code.py`:
- Around line 57-76: The code currently constructs uri then passes it to
NotBlankStr(uri) when building a RetrievedItem, which raises on blank URIs and
aborts the whole source call; change the logic in the loop that builds
ResearchCitation/ RetrievedItem (the block that sets uri, citation, and calls
items.append) to validate uri first and skip the current result if uri is
empty/whitespace (e.g., check uri.strip() or equivalent before creating
NotBlankStr), so other valid rows for the same sub_query.index are preserved;
keep using ResearchCitation, CodeSourceLocator and RetrievedItem as before for
valid rows.
In `@src/synthorg/research/service.py`:
- Around line 218-224: The TaskGroup may wrap system exceptions from
_safe_retrieve into a BaseExceptionGroup so callers (like run()) can't match
MemoryError/RecursionError; modify _retrieve() to catch BaseExceptionGroup after
the TaskGroup completes, inspect its .exceptions (and nested groups) for any
MemoryError or RecursionError and re-raise that original exception (or re-raise
the BaseExceptionGroup if none match). Reference _retrieve(), _safe_retrieve(),
and run() when making the change so the handler in run() can observe raw system
errors instead of a wrapped BaseExceptionGroup.
In `@src/synthorg/research/synthesis/llm_synthesizer.py`:
- Around line 131-143: The returned prompt includes model-produced text
plan.research_angle raw; update the synthesis assembly to fence that value with
wrap_untrusted(TAG_TASK_DATA, plan.research_angle) before injecting it into the
prompt so SEC-1 boundaries are preserved—modify the return expression that
currently builds f"{question}\nResearch angle:
{plan.research_angle}\n\nSources:\n" to use the wrapped research angle and keep
existing question and blocks handling (references: variables question,
plan.research_angle, blocks and function/module llm_synthesizer.py; use
wrap_untrusted from engine.prompt_safety).
In `@src/synthorg/research/tool.py`:
- Line 86: The run_id construction currently uses f"{args.min_credibility:.4f}"
which lossy-rounds args.min_credibility and can collapse distinct requests;
change the run_id assembly (where f"{args.min_credibility:.4f}" is used) to
serialize min_credibility deterministically without rounding — e.g., use
repr(args.min_credibility) or format(args.min_credibility, ".17g") or convert to
a Decimal and use its exact string — so distinct float values remain unique and
do not overwrite persisted runs.
In `@src/synthorg/research/triage/llm.py`:
- Around line 59-64: Validate the batch_size parameter in the class __init__
(where batch_size default is RESEARCH_TRIAGE_BATCH_SIZE) before assigning to
self._batch_size: ensure it is an int and greater than 0, and raise a ValueError
with a clear message if not (to prevent range(..., 0) and negative behavior).
Update the constructor logic that currently sets self._batch_size = batch_size
to perform this check (referencing the batch_size parameter, self._batch_size,
and RESEARCH_TRIAGE_BATCH_SIZE).
In `@src/synthorg/settings/definitions/research.py`:
- Around line 15-27: The SettingDefinition for the research master switch
(namespace SettingNamespace.RESEARCH, key "enabled") must be marked
restart-required so changes are not treated as hot-configurable; update the
SettingDefinition instance for that key to include the restart-required metadata
(e.g., set the restart_required flag/property to true or the appropriate enum
value used by SettingDefinition) so runtime edits are recognized as requiring a
restart to take effect.
In `@tests/evals_spine/test_research_eval.py`:
- Around line 198-214: The test's assertion that replay_web.queries is empty is
ineffective because replay_web (FakeWebSearchProvider) is never injected into
the replay run; the replay uses replay_sources produced by
build_replay_sources(recorded.retrieved_items) instead. Fix by wiring the
FakeWebSearchProvider into the replay path: update the replay_sources or the
call to _build_service/_build_service(...).run so that the web provider used
during the replay is replay_web (e.g., include replay_web under the appropriate
ResearchSourceType key in replay_sources or pass replay_web into the service
factory), so any accidental live web queries will be recorded in replay_web.
Ensure you reference and modify replay_web, replay_sources,
build_replay_sources, and the _build_service(...) invocation.
In `@tests/unit/persistence/test_protocol.py`:
- Around line 1167-1173: The research_runs property currently returns object()
which only checks attribute presence; replace it with a protocol-shaped fake
that implements the research repository interface (i.e., mimic the expected
methods/attributes of the research repo) so the isinstance/runtime_checkable
contract verifies shape, and update the conformance suite by adding a
backend-routed assertion in TestProtocolCompliance to ensure the
PersistenceBackend routes to the new research repository path correctly; locate
the research_runs property and the TestProtocolCompliance test to implement the
fake and add the assertion respectively.
In `@tests/unit/research/test_research_service.py`:
- Around line 173-199: The test creates a fake web provider (replay_web) but
never injects it into the service, so the final assertion about replay isolation
is vacuous; update the test to pass replay_web into the service construction so
ResearchService uses the fake web provider during replay. Concretely, change the
call to _build_service (or adjust its call-site) to accept and forward
replay_web (or a named parameter like web_provider) so that the instantiated
ResearchService (or the helper that builds it) uses replay_web instead of the
real web provider; keep the final assertion assert replay_web.queries == []
unchanged.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI (base), Organization UI (inherited)
Review profile: ASSERTIVE
Plan: Pro
Run ID: 51d2e90a-cc11-44df-9229-adabd1ceb492
📒 Files selected for processing (92)
docs/design/research-mode.mdevals/models/brief.pyevals/scoring/research.pyscripts/_ghost_wiring_manifest.txtscripts/check_no_ghost_wiring.pyscripts/check_provider_complete_chokepoint.pysrc/synthorg/api/app.pysrc/synthorg/api/state.pysrc/synthorg/core/enums.pysrc/synthorg/core/error_taxonomy.pysrc/synthorg/engine/prompt_safety.pysrc/synthorg/meta/mcp/domains/__init__.pysrc/synthorg/meta/mcp/domains/_research_args.pysrc/synthorg/meta/mcp/domains/research.pysrc/synthorg/meta/mcp/handlers/__init__.pysrc/synthorg/meta/mcp/handlers/research.pysrc/synthorg/observability/events/persistence.pysrc/synthorg/observability/events/research.pysrc/synthorg/observability/prometheus_labels.pysrc/synthorg/persistence/postgres/backend.pysrc/synthorg/persistence/postgres/research_run_repo.pysrc/synthorg/persistence/postgres/revisions/20260522000002_research_runs.sqlsrc/synthorg/persistence/postgres/schema.sqlsrc/synthorg/persistence/protocol.pysrc/synthorg/persistence/research_protocol.pysrc/synthorg/persistence/sqlite/_backend_accessors.pysrc/synthorg/persistence/sqlite/backend.pysrc/synthorg/persistence/sqlite/research_run_repo.pysrc/synthorg/persistence/sqlite/revisions/20260522000002_research_runs.sqlsrc/synthorg/persistence/sqlite/schema.sqlsrc/synthorg/research/__init__.pysrc/synthorg/research/_args.pysrc/synthorg/research/_llm.pysrc/synthorg/research/config.pysrc/synthorg/research/constants.pysrc/synthorg/research/errors.pysrc/synthorg/research/factory.pysrc/synthorg/research/models.pysrc/synthorg/research/planning/__init__.pysrc/synthorg/research/planning/llm_planner.pysrc/synthorg/research/planning/protocol.pysrc/synthorg/research/retrieval/__init__.pysrc/synthorg/research/retrieval/dedup.pysrc/synthorg/research/retrieval/protocol.pysrc/synthorg/research/retrieval/providers.pysrc/synthorg/research/retrieval/replay.pysrc/synthorg/research/retrieval/sources/__init__.pysrc/synthorg/research/retrieval/sources/_shared.pysrc/synthorg/research/retrieval/sources/academic.pysrc/synthorg/research/retrieval/sources/code.pysrc/synthorg/research/retrieval/sources/knowledge.pysrc/synthorg/research/retrieval/sources/web.pysrc/synthorg/research/service.pysrc/synthorg/research/synthesis/__init__.pysrc/synthorg/research/synthesis/citation_binder.pysrc/synthorg/research/synthesis/llm_synthesizer.pysrc/synthorg/research/synthesis/protocol.pysrc/synthorg/research/tool.pysrc/synthorg/research/tool_factory.pysrc/synthorg/research/triage/__init__.pysrc/synthorg/research/triage/heuristic.pysrc/synthorg/research/triage/hybrid.pysrc/synthorg/research/triage/llm.pysrc/synthorg/research/triage/protocol.pysrc/synthorg/security/action_types.pysrc/synthorg/security/risk_scorer.pysrc/synthorg/security/rules/risk_classifier.pysrc/synthorg/security/timeout/risk_tier_classifier.pysrc/synthorg/settings/definitions/__init__.pysrc/synthorg/settings/definitions/research.pysrc/synthorg/settings/enums.pytests/conformance/persistence/test_research_run_repository.pytests/evals/prompt/test_agent_system_prompt.pytests/evals_spine/test_research_eval.pytests/unit/api/fakes_backend.pytests/unit/core/test_enums.pytests/unit/meta/mcp/test_all_handlers_wired.pytests/unit/meta/mcp/test_research_handlers.pytests/unit/observability/test_events.pytests/unit/persistence/test_protocol.pytests/unit/research/_fakes.pytests/unit/research/test_planning.pytests/unit/research/test_research_models.pytests/unit/research/test_research_retrieval.pytests/unit/research/test_research_service.pytests/unit/research/test_synthesis.pytests/unit/research/test_tool.pytests/unit/research/test_triage.pytests/unit/security/test_action_types.pyweb/src/api/types/enum-values.gen.tsweb/src/api/types/error-codes.gen.tsweb/src/api/types/openapi.gen.ts
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
- GitHub Check: Build Backend
- GitHub Check: Lighthouse Site
- GitHub Check: CodSpeed Web benchmarks
- GitHub Check: CodSpeed Python benchmarks
- GitHub Check: Test Conformance (SQLite)
- GitHub Check: Dashboard Test
- GitHub Check: Test Unit
- GitHub Check: Test E2E
- GitHub Check: Test Integration
- GitHub Check: Build Preview
- GitHub Check: Analyze (javascript-typescript)
- GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (12)
**/*.{py,ts,tsx,jsx,md}
📄 CodeRabbit inference engine (CLAUDE.md)
No region/currency/locale privileged; use metric units; British English per docs/reference/regional-defaults.md
Files:
src/synthorg/research/synthesis/__init__.pytests/evals/prompt/test_agent_system_prompt.pysrc/synthorg/settings/definitions/__init__.pysrc/synthorg/research/triage/__init__.pytests/unit/observability/test_events.pysrc/synthorg/settings/definitions/research.pysrc/synthorg/meta/mcp/domains/research.pysrc/synthorg/security/risk_scorer.pysrc/synthorg/research/retrieval/__init__.pysrc/synthorg/engine/prompt_safety.pyweb/src/api/types/enum-values.gen.tssrc/synthorg/observability/events/research.pysrc/synthorg/research/retrieval/sources/__init__.pysrc/synthorg/research/__init__.pyweb/src/api/types/openapi.gen.tssrc/synthorg/observability/events/persistence.pytests/unit/security/test_action_types.pysrc/synthorg/persistence/protocol.pysrc/synthorg/meta/mcp/handlers/__init__.pysrc/synthorg/research/retrieval/sources/academic.pysrc/synthorg/persistence/sqlite/backend.pysrc/synthorg/settings/enums.pytests/unit/meta/mcp/test_all_handlers_wired.pysrc/synthorg/persistence/sqlite/_backend_accessors.pysrc/synthorg/meta/mcp/domains/__init__.pyscripts/check_provider_complete_chokepoint.pysrc/synthorg/research/planning/__init__.pytests/unit/research/test_planning.pysrc/synthorg/research/constants.pysrc/synthorg/persistence/research_protocol.pysrc/synthorg/research/retrieval/protocol.pytests/unit/api/fakes_backend.pysrc/synthorg/security/action_types.pytests/unit/research/test_triage.pysrc/synthorg/observability/prometheus_labels.pysrc/synthorg/research/planning/protocol.pysrc/synthorg/research/retrieval/replay.pytests/unit/core/test_enums.pysrc/synthorg/research/retrieval/sources/_shared.pysrc/synthorg/persistence/postgres/backend.pyweb/src/api/types/error-codes.gen.tstests/unit/research/test_synthesis.pysrc/synthorg/core/error_taxonomy.pysrc/synthorg/research/retrieval/sources/web.pysrc/synthorg/research/triage/heuristic.pysrc/synthorg/research/synthesis/protocol.pysrc/synthorg/security/timeout/risk_tier_classifier.pysrc/synthorg/research/triage/protocol.pydocs/design/research-mode.mdtests/unit/research/test_tool.pysrc/synthorg/research/retrieval/sources/code.pysrc/synthorg/research/synthesis/citation_binder.pysrc/synthorg/meta/mcp/domains/_research_args.pysrc/synthorg/research/_args.pysrc/synthorg/research/config.pysrc/synthorg/research/errors.pysrc/synthorg/research/planning/llm_planner.pytests/unit/research/test_research_service.pysrc/synthorg/meta/mcp/handlers/research.pytests/evals_spine/test_research_eval.pysrc/synthorg/research/tool_factory.pysrc/synthorg/core/enums.pysrc/synthorg/research/retrieval/sources/knowledge.pyevals/scoring/research.pyevals/models/brief.pysrc/synthorg/research/retrieval/providers.pysrc/synthorg/research/triage/hybrid.pysrc/synthorg/persistence/postgres/research_run_repo.pytests/unit/persistence/test_protocol.pysrc/synthorg/security/rules/risk_classifier.pysrc/synthorg/research/factory.pytests/unit/research/test_research_models.pysrc/synthorg/research/models.pysrc/synthorg/research/tool.pyscripts/check_no_ghost_wiring.pysrc/synthorg/research/synthesis/llm_synthesizer.pytests/unit/meta/mcp/test_research_handlers.pysrc/synthorg/research/triage/llm.pysrc/synthorg/api/state.pysrc/synthorg/persistence/sqlite/research_run_repo.pysrc/synthorg/research/_llm.pytests/unit/research/_fakes.pytests/conformance/persistence/test_research_run_repository.pysrc/synthorg/research/retrieval/dedup.pysrc/synthorg/research/service.pytests/unit/research/test_research_retrieval.pysrc/synthorg/api/app.py
src/synthorg/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/synthorg/**/*.py: Onlysrc/synthorg/persistence/may import sqlite/psycopg or emit raw SQL; new repository protocols inherit from generic categories inpersistence/_generics.py; bespoke methods permitted only under ADR-0001 D7
Configuration Precedence: DB > env > code default viaSettingsService/ConfigResolver(Cat-1) or env > code default (Cat-2,read_only_post_init); Cat-3 bootstrap secrets pure env; YAML is ingestion format only, not precedence tier; noos.environ.getoutside startup
No hardcoded numeric values; numerics live insettings/definitions/; allowlist only 0/1/-1, HTTP codes, hex masks, powers-of-2, and module-level annotated named constants (NAME: int|float|Final); enforced byscripts/check_no_magic_numbers.py
Comments document WHY only; no reviewer citations, issue back-refs, or migration framing; enforced bycheck_no_review_origin_in_code.py+check_no_migration_framing.py
Nofrom __future__ import annotations(Python 3.14 has PEP 649); use PEP 758 except:except A, B:no parens unless binding
Type hints on public functions; mypy strict; Google-style docstrings; line length 88; functions <50 lines; files <800 lines
Errors follow<Domain><Condition>Errorpattern fromDomainError; never inheritException/RuntimeErrordirectly; enforced bycheck_domain_error_hierarchy.py
Pydantic v2 frozen +extra="forbid"on every frozen model project-wide; gatecheck_frozen_model_extra_forbid.py;@computed_fieldauto-exempt; per-line# lint-allow: frozen-extra-forbid -- <reason>forextra="allow"/"ignore"boundaries; use@computed_fieldfor derived; useNotBlankStrfor identifiers
Args models at every system boundary;parse_typed()for every external dict ingestion; enforced bycheck_boundary_typed.py
Immutability: usemodel_copy(update=...)orcopy.deepcopy(); deepcopy at system boundaries
Async: useasyncio.TaskGroupfor fan-out/fan-in; helpers catchException(re-raiseMemoryError/`RecursionError...
Files:
src/synthorg/research/synthesis/__init__.pysrc/synthorg/settings/definitions/__init__.pysrc/synthorg/research/triage/__init__.pysrc/synthorg/settings/definitions/research.pysrc/synthorg/meta/mcp/domains/research.pysrc/synthorg/security/risk_scorer.pysrc/synthorg/research/retrieval/__init__.pysrc/synthorg/engine/prompt_safety.pysrc/synthorg/observability/events/research.pysrc/synthorg/research/retrieval/sources/__init__.pysrc/synthorg/research/__init__.pysrc/synthorg/observability/events/persistence.pysrc/synthorg/persistence/protocol.pysrc/synthorg/meta/mcp/handlers/__init__.pysrc/synthorg/research/retrieval/sources/academic.pysrc/synthorg/persistence/sqlite/backend.pysrc/synthorg/settings/enums.pysrc/synthorg/persistence/sqlite/_backend_accessors.pysrc/synthorg/meta/mcp/domains/__init__.pysrc/synthorg/research/planning/__init__.pysrc/synthorg/research/constants.pysrc/synthorg/persistence/research_protocol.pysrc/synthorg/research/retrieval/protocol.pysrc/synthorg/security/action_types.pysrc/synthorg/observability/prometheus_labels.pysrc/synthorg/research/planning/protocol.pysrc/synthorg/research/retrieval/replay.pysrc/synthorg/research/retrieval/sources/_shared.pysrc/synthorg/persistence/postgres/backend.pysrc/synthorg/core/error_taxonomy.pysrc/synthorg/research/retrieval/sources/web.pysrc/synthorg/research/triage/heuristic.pysrc/synthorg/research/synthesis/protocol.pysrc/synthorg/security/timeout/risk_tier_classifier.pysrc/synthorg/research/triage/protocol.pysrc/synthorg/research/retrieval/sources/code.pysrc/synthorg/research/synthesis/citation_binder.pysrc/synthorg/meta/mcp/domains/_research_args.pysrc/synthorg/research/_args.pysrc/synthorg/research/config.pysrc/synthorg/research/errors.pysrc/synthorg/research/planning/llm_planner.pysrc/synthorg/meta/mcp/handlers/research.pysrc/synthorg/research/tool_factory.pysrc/synthorg/core/enums.pysrc/synthorg/research/retrieval/sources/knowledge.pysrc/synthorg/research/retrieval/providers.pysrc/synthorg/research/triage/hybrid.pysrc/synthorg/persistence/postgres/research_run_repo.pysrc/synthorg/security/rules/risk_classifier.pysrc/synthorg/research/factory.pysrc/synthorg/research/models.pysrc/synthorg/research/tool.pysrc/synthorg/research/synthesis/llm_synthesizer.pysrc/synthorg/research/triage/llm.pysrc/synthorg/api/state.pysrc/synthorg/persistence/sqlite/research_run_repo.pysrc/synthorg/research/_llm.pysrc/synthorg/research/retrieval/dedup.pysrc/synthorg/research/service.pysrc/synthorg/api/app.py
src/**/*.py
⚙️ CodeRabbit configuration file
This project uses Python 3.14+ with PEP 758 except syntax: "except A, B:" (comma-separated, no parentheses) is correct and mandatory -- do NOT flag it as a typo or suggest parenthesized form. The "except builtins.MemoryError, RecursionError: raise" pattern is intentional project convention for system-error propagation. When evaluating the 50-line function limit, count only the function body excluding the signature lines, decorators, and docstring. Functions 1-5 lines over due to docstrings or multi-line signatures should not be flagged. Do not suggest extracting single-use helper functions called exactly once -- this reduces readability without improving maintainability.
Files:
src/synthorg/research/synthesis/__init__.pysrc/synthorg/settings/definitions/__init__.pysrc/synthorg/research/triage/__init__.pysrc/synthorg/settings/definitions/research.pysrc/synthorg/meta/mcp/domains/research.pysrc/synthorg/security/risk_scorer.pysrc/synthorg/research/retrieval/__init__.pysrc/synthorg/engine/prompt_safety.pysrc/synthorg/observability/events/research.pysrc/synthorg/research/retrieval/sources/__init__.pysrc/synthorg/research/__init__.pysrc/synthorg/observability/events/persistence.pysrc/synthorg/persistence/protocol.pysrc/synthorg/meta/mcp/handlers/__init__.pysrc/synthorg/research/retrieval/sources/academic.pysrc/synthorg/persistence/sqlite/backend.pysrc/synthorg/settings/enums.pysrc/synthorg/persistence/sqlite/_backend_accessors.pysrc/synthorg/meta/mcp/domains/__init__.pysrc/synthorg/research/planning/__init__.pysrc/synthorg/research/constants.pysrc/synthorg/persistence/research_protocol.pysrc/synthorg/research/retrieval/protocol.pysrc/synthorg/security/action_types.pysrc/synthorg/observability/prometheus_labels.pysrc/synthorg/research/planning/protocol.pysrc/synthorg/research/retrieval/replay.pysrc/synthorg/research/retrieval/sources/_shared.pysrc/synthorg/persistence/postgres/backend.pysrc/synthorg/core/error_taxonomy.pysrc/synthorg/research/retrieval/sources/web.pysrc/synthorg/research/triage/heuristic.pysrc/synthorg/research/synthesis/protocol.pysrc/synthorg/security/timeout/risk_tier_classifier.pysrc/synthorg/research/triage/protocol.pysrc/synthorg/research/retrieval/sources/code.pysrc/synthorg/research/synthesis/citation_binder.pysrc/synthorg/meta/mcp/domains/_research_args.pysrc/synthorg/research/_args.pysrc/synthorg/research/config.pysrc/synthorg/research/errors.pysrc/synthorg/research/planning/llm_planner.pysrc/synthorg/meta/mcp/handlers/research.pysrc/synthorg/research/tool_factory.pysrc/synthorg/core/enums.pysrc/synthorg/research/retrieval/sources/knowledge.pysrc/synthorg/research/retrieval/providers.pysrc/synthorg/research/triage/hybrid.pysrc/synthorg/persistence/postgres/research_run_repo.pysrc/synthorg/security/rules/risk_classifier.pysrc/synthorg/research/factory.pysrc/synthorg/research/models.pysrc/synthorg/research/tool.pysrc/synthorg/research/synthesis/llm_synthesizer.pysrc/synthorg/research/triage/llm.pysrc/synthorg/api/state.pysrc/synthorg/persistence/sqlite/research_run_repo.pysrc/synthorg/research/_llm.pysrc/synthorg/research/retrieval/dedup.pysrc/synthorg/research/service.pysrc/synthorg/api/app.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/**/*.py: Timeout/slow failures = source-code regression; never edittests/baselines/unit_timing.jsonor anyscripts/*_baseline.{txt,json}/scripts/_*_baseline.py; both families PreToolUse-blocked; per-invocation bypass requires explicit approval (ALLOW_BASELINE_GROWTH=1 git commit)
Test markers:@pytest.mark.{unit,integration,e2e,slow}; async auto; timeout 30s global; coverage 80% min
xdist-n 8 --dist=loadfileauto-applied via pyprojectaddopts; Windows unit tests useWindowsSelectorEventLoopPolicy; subprocess tests override back
Test doubles:FakeClockfor Clock seam,mock_of[T](**overrides)for typed-boundary substitutions,SimpleNamespacefor attribute-bags; bareMagicMockat typed boundary blocked byscripts/check_mock_spec.py
Hypothesis: 10 deterministic CI examples; failures are real bugs (fix + add@example(...)); never skip/xfail flaky tests; fix fundamentally
Files:
tests/evals/prompt/test_agent_system_prompt.pytests/unit/observability/test_events.pytests/unit/security/test_action_types.pytests/unit/meta/mcp/test_all_handlers_wired.pytests/unit/research/test_planning.pytests/unit/api/fakes_backend.pytests/unit/research/test_triage.pytests/unit/core/test_enums.pytests/unit/research/test_synthesis.pytests/unit/research/test_tool.pytests/unit/research/test_research_service.pytests/evals_spine/test_research_eval.pytests/unit/persistence/test_protocol.pytests/unit/research/test_research_models.pytests/unit/meta/mcp/test_research_handlers.pytests/unit/research/_fakes.pytests/conformance/persistence/test_research_run_repository.pytests/unit/research/test_research_retrieval.py
⚙️ CodeRabbit configuration file
Test files do not require Google-style docstrings on classes or functions -- ruff D rules are only enforced on src/. A bare
@settings() decorator with no arguments on Hypothesis property tests is a no-op and should not be suggested -- the HYPOTHESIS_PROFILE env var controls example counts via registered profiles, which@given() honors automatically.
Files:
tests/evals/prompt/test_agent_system_prompt.pytests/unit/observability/test_events.pytests/unit/security/test_action_types.pytests/unit/meta/mcp/test_all_handlers_wired.pytests/unit/research/test_planning.pytests/unit/api/fakes_backend.pytests/unit/research/test_triage.pytests/unit/core/test_enums.pytests/unit/research/test_synthesis.pytests/unit/research/test_tool.pytests/unit/research/test_research_service.pytests/evals_spine/test_research_eval.pytests/unit/persistence/test_protocol.pytests/unit/research/test_research_models.pytests/unit/meta/mcp/test_research_handlers.pytests/unit/research/_fakes.pytests/conformance/persistence/test_research_run_repository.pytests/unit/research/test_research_retrieval.py
src/synthorg/meta/mcp/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
MCP: Define
ToolHandler+args_model; callrequire_admin_guardrails()on admin tools; route through service layers per mcp-handler-contract.md
Files:
src/synthorg/meta/mcp/domains/research.pysrc/synthorg/meta/mcp/handlers/__init__.pysrc/synthorg/meta/mcp/domains/__init__.pysrc/synthorg/meta/mcp/domains/_research_args.pysrc/synthorg/meta/mcp/handlers/research.py
web/src/**/*.{js,jsx,ts,tsx,mts}
📄 CodeRabbit inference engine (web/CLAUDE.md)
web/src/**/*.{js,jsx,ts,tsx,mts}: Always usecreateLoggerfrom@/lib/logger; never bareconsole.warn/console.error/console.debugin application code. Variable name must always belog. Onlylogger.tsitself may use bare console methods. Uselog.debug()(DEV-only, stripped in production),log.warn(),log.error().
Pass dynamic/untrusted values as separate args to logger calls (not interpolated into the message string) so they go throughsanitizeArg
Attacker-controlled fields inside structured objects must be wrapped insanitizeForLog()before embedding in log calls
Error-code constants (MANDATORY): importErrorCodeandErrorCategoryfrom@/api/types/errors(re-exported from the generatedweb/src/api/types/error-codes.gen.ts). Discriminate onErrorCode.<NAME>, never on raw integer literals.
Use@eslint-react/web-api-no-leaked-fetchto detectfetch()in effects withoutAbortControllercleanup
Files:
web/src/api/types/enum-values.gen.tsweb/src/api/types/openapi.gen.tsweb/src/api/types/error-codes.gen.ts
web/src/api/types/**/*.gen.ts
📄 CodeRabbit inference engine (web/CLAUDE.md)
Generated DTO types (MANDATORY): NEVER hand-edit
web/src/api/types/*.gen.ts. Regenerate withuv run python scripts/generate_dto_types_ts.py. Import DTOs via the barrel (import type { AgentConfig } from '@/api/types').
Files:
web/src/api/types/enum-values.gen.tsweb/src/api/types/openapi.gen.tsweb/src/api/types/error-codes.gen.ts
web/src/**/*.{ts,tsx,mts}
📄 CodeRabbit inference engine (web/CLAUDE.md)
web/src/**/*.{ts,tsx,mts}: Use@typescript-eslint/no-floating-promisesto forbid unawaited promises so async work cannot survive the test that scheduled it and trip the active-handle gate
Use@typescript-eslint/no-misused-promises(withchecksVoidReturn: { attributes: false }) to forbid passing async functions where the callsite ignores the returned promise. React 19asyncevent handlers stay allowed via theattributes: falseexemption.
Files:
web/src/api/types/enum-values.gen.tsweb/src/api/types/openapi.gen.tsweb/src/api/types/error-codes.gen.ts
src/synthorg/persistence/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/synthorg/persistence/**/*.py: Repository CRUD:save(entity),get(id),delete(id) -> bool,list_items(...),query(...)returning tuples
Datetime in persistence: useparse_iso_utc/format_iso_utcfrompersistence._shared(reject naive); usenormalize_utcfor already-typed
Files:
src/synthorg/persistence/protocol.pysrc/synthorg/persistence/sqlite/backend.pysrc/synthorg/persistence/sqlite/_backend_accessors.pysrc/synthorg/persistence/research_protocol.pysrc/synthorg/persistence/postgres/backend.pysrc/synthorg/persistence/postgres/research_run_repo.pysrc/synthorg/persistence/sqlite/research_run_repo.py
scripts/check_*.{py,sh}
📄 CodeRabbit inference engine (CLAUDE.md)
Every convention PR ships its enforcement gate per docs/reference/convention-gates.md
Files:
scripts/check_provider_complete_chokepoint.pyscripts/check_no_ghost_wiring.py
**/*.md
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.md: Numerics in README + public docs sourced fromdata/runtime_stats.yamlvia<!--RS:NAME-->markers per data/README.md
Used2for architecture / nested containers;mermaidfor flowcharts / sequence / pipelines; Markdown tables for tabular data; D2 theme 200 (Dark Mauve); D2 CLI pinned to v0.7.1 in CI
Files:
docs/design/research-mode.md
tests/conformance/persistence/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Dual-backend conformance:
tests/conformance/persistence/consumesbackendfixture (SQLite + Postgres); enforced bycheck_dual_backend_test_parity.py
Files:
tests/conformance/persistence/test_research_run_repository.py
🧠 Learnings (9)
📚 Learning: 2026-05-05T09:04:46.195Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1760
File: scripts/_dual_backend_parity_lib.py:215-216
Timestamp: 2026-05-05T09:04:46.195Z
Learning: This repository targets Python 3.14+ and follows PEP 758. Therefore, reviewer tooling should NOT treat unparenthesized multi-exception `except` clauses written without an `as` clause (e.g., `except MemoryError, RecursionError:`) as syntax errors. Only flag `except`-clause problems when they are genuinely invalid for Python 3.14+.
Applied to files:
src/synthorg/research/synthesis/__init__.pytests/evals/prompt/test_agent_system_prompt.pysrc/synthorg/settings/definitions/__init__.pysrc/synthorg/research/triage/__init__.pytests/unit/observability/test_events.pysrc/synthorg/settings/definitions/research.pysrc/synthorg/meta/mcp/domains/research.pysrc/synthorg/security/risk_scorer.pysrc/synthorg/research/retrieval/__init__.pysrc/synthorg/engine/prompt_safety.pysrc/synthorg/observability/events/research.pysrc/synthorg/research/retrieval/sources/__init__.pysrc/synthorg/research/__init__.pysrc/synthorg/observability/events/persistence.pytests/unit/security/test_action_types.pysrc/synthorg/persistence/protocol.pysrc/synthorg/meta/mcp/handlers/__init__.pysrc/synthorg/research/retrieval/sources/academic.pysrc/synthorg/persistence/sqlite/backend.pysrc/synthorg/settings/enums.pytests/unit/meta/mcp/test_all_handlers_wired.pysrc/synthorg/persistence/sqlite/_backend_accessors.pysrc/synthorg/meta/mcp/domains/__init__.pyscripts/check_provider_complete_chokepoint.pysrc/synthorg/research/planning/__init__.pytests/unit/research/test_planning.pysrc/synthorg/research/constants.pysrc/synthorg/persistence/research_protocol.pysrc/synthorg/research/retrieval/protocol.pytests/unit/api/fakes_backend.pysrc/synthorg/security/action_types.pytests/unit/research/test_triage.pysrc/synthorg/observability/prometheus_labels.pysrc/synthorg/research/planning/protocol.pysrc/synthorg/research/retrieval/replay.pytests/unit/core/test_enums.pysrc/synthorg/research/retrieval/sources/_shared.pysrc/synthorg/persistence/postgres/backend.pytests/unit/research/test_synthesis.pysrc/synthorg/core/error_taxonomy.pysrc/synthorg/research/retrieval/sources/web.pysrc/synthorg/research/triage/heuristic.pysrc/synthorg/research/synthesis/protocol.pysrc/synthorg/security/timeout/risk_tier_classifier.pysrc/synthorg/research/triage/protocol.pytests/unit/research/test_tool.pysrc/synthorg/research/retrieval/sources/code.pysrc/synthorg/research/synthesis/citation_binder.pysrc/synthorg/meta/mcp/domains/_research_args.pysrc/synthorg/research/_args.pysrc/synthorg/research/config.pysrc/synthorg/research/errors.pysrc/synthorg/research/planning/llm_planner.pytests/unit/research/test_research_service.pysrc/synthorg/meta/mcp/handlers/research.pytests/evals_spine/test_research_eval.pysrc/synthorg/research/tool_factory.pysrc/synthorg/core/enums.pysrc/synthorg/research/retrieval/sources/knowledge.pyevals/scoring/research.pyevals/models/brief.pysrc/synthorg/research/retrieval/providers.pysrc/synthorg/research/triage/hybrid.pysrc/synthorg/persistence/postgres/research_run_repo.pytests/unit/persistence/test_protocol.pysrc/synthorg/security/rules/risk_classifier.pysrc/synthorg/research/factory.pytests/unit/research/test_research_models.pysrc/synthorg/research/models.pysrc/synthorg/research/tool.pyscripts/check_no_ghost_wiring.pysrc/synthorg/research/synthesis/llm_synthesizer.pytests/unit/meta/mcp/test_research_handlers.pysrc/synthorg/research/triage/llm.pysrc/synthorg/api/state.pysrc/synthorg/persistence/sqlite/research_run_repo.pysrc/synthorg/research/_llm.pytests/unit/research/_fakes.pytests/conformance/persistence/test_research_run_repository.pysrc/synthorg/research/retrieval/dedup.pysrc/synthorg/research/service.pytests/unit/research/test_research_retrieval.pysrc/synthorg/api/app.py
📚 Learning: 2026-05-21T22:55:20.496Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 2035
File: src/synthorg/meta/toolsmith/models.py:114-114
Timestamp: 2026-05-21T22:55:20.496Z
Learning: In this repo’s “magic number” review standard, the existing gate in `scripts/check_no_magic_numbers.py` intentionally does NOT flag numeric literals used as raw call-site arguments. So, do not flag numeric literals passed as keyword arguments to Pydantic `Field()` (e.g., `Field(ge=0, le=100)` / `Field(ge=1, le=50)`)—this is an established idiom. Only treat numeric literals as “magic numbers” when they occur in the locations the gate checks (module-level assignments and function/method parameter defaults).
Applied to files:
src/synthorg/research/synthesis/__init__.pytests/evals/prompt/test_agent_system_prompt.pysrc/synthorg/settings/definitions/__init__.pysrc/synthorg/research/triage/__init__.pytests/unit/observability/test_events.pysrc/synthorg/settings/definitions/research.pysrc/synthorg/meta/mcp/domains/research.pysrc/synthorg/security/risk_scorer.pysrc/synthorg/research/retrieval/__init__.pysrc/synthorg/engine/prompt_safety.pysrc/synthorg/observability/events/research.pysrc/synthorg/research/retrieval/sources/__init__.pysrc/synthorg/research/__init__.pysrc/synthorg/observability/events/persistence.pytests/unit/security/test_action_types.pysrc/synthorg/persistence/protocol.pysrc/synthorg/meta/mcp/handlers/__init__.pysrc/synthorg/research/retrieval/sources/academic.pysrc/synthorg/persistence/sqlite/backend.pysrc/synthorg/settings/enums.pytests/unit/meta/mcp/test_all_handlers_wired.pysrc/synthorg/persistence/sqlite/_backend_accessors.pysrc/synthorg/meta/mcp/domains/__init__.pyscripts/check_provider_complete_chokepoint.pysrc/synthorg/research/planning/__init__.pytests/unit/research/test_planning.pysrc/synthorg/research/constants.pysrc/synthorg/persistence/research_protocol.pysrc/synthorg/research/retrieval/protocol.pytests/unit/api/fakes_backend.pysrc/synthorg/security/action_types.pytests/unit/research/test_triage.pysrc/synthorg/observability/prometheus_labels.pysrc/synthorg/research/planning/protocol.pysrc/synthorg/research/retrieval/replay.pytests/unit/core/test_enums.pysrc/synthorg/research/retrieval/sources/_shared.pysrc/synthorg/persistence/postgres/backend.pytests/unit/research/test_synthesis.pysrc/synthorg/core/error_taxonomy.pysrc/synthorg/research/retrieval/sources/web.pysrc/synthorg/research/triage/heuristic.pysrc/synthorg/research/synthesis/protocol.pysrc/synthorg/security/timeout/risk_tier_classifier.pysrc/synthorg/research/triage/protocol.pytests/unit/research/test_tool.pysrc/synthorg/research/retrieval/sources/code.pysrc/synthorg/research/synthesis/citation_binder.pysrc/synthorg/meta/mcp/domains/_research_args.pysrc/synthorg/research/_args.pysrc/synthorg/research/config.pysrc/synthorg/research/errors.pysrc/synthorg/research/planning/llm_planner.pytests/unit/research/test_research_service.pysrc/synthorg/meta/mcp/handlers/research.pytests/evals_spine/test_research_eval.pysrc/synthorg/research/tool_factory.pysrc/synthorg/core/enums.pysrc/synthorg/research/retrieval/sources/knowledge.pyevals/scoring/research.pyevals/models/brief.pysrc/synthorg/research/retrieval/providers.pysrc/synthorg/research/triage/hybrid.pysrc/synthorg/persistence/postgres/research_run_repo.pytests/unit/persistence/test_protocol.pysrc/synthorg/security/rules/risk_classifier.pysrc/synthorg/research/factory.pytests/unit/research/test_research_models.pysrc/synthorg/research/models.pysrc/synthorg/research/tool.pyscripts/check_no_ghost_wiring.pysrc/synthorg/research/synthesis/llm_synthesizer.pytests/unit/meta/mcp/test_research_handlers.pysrc/synthorg/research/triage/llm.pysrc/synthorg/api/state.pysrc/synthorg/persistence/sqlite/research_run_repo.pysrc/synthorg/research/_llm.pytests/unit/research/_fakes.pytests/conformance/persistence/test_research_run_repository.pysrc/synthorg/research/retrieval/dedup.pysrc/synthorg/research/service.pytests/unit/research/test_research_retrieval.pysrc/synthorg/api/app.py
📚 Learning: 2026-05-21T22:55:09.289Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 2035
File: src/synthorg/meta/toolsmith/config.py:29-30
Timestamp: 2026-05-21T22:55:09.289Z
Learning: For this repo’s Pydantic configuration idiom, do not treat numeric literals passed directly as arguments to `pydantic.Field(...)` as “magic numbers” during review. This includes call-site usages like `Field(default=0.2, ge=0.0, le=1.0)` (e.g., in config models such as `ToolAuthoringConfig`, `ToolValidationConfig`, `ToolsmithConfig`). Do not request extracting those `Field(...)` numeric arguments into named constants, since the repo’s `scripts/check_no_magic_numbers.py` intentionally excludes call-site `Field(...)` numerics and relies on `Field(...)` as the canonical way to express these constraints/defaults.
Applied to files:
src/synthorg/research/synthesis/__init__.pysrc/synthorg/settings/definitions/__init__.pysrc/synthorg/research/triage/__init__.pysrc/synthorg/settings/definitions/research.pysrc/synthorg/meta/mcp/domains/research.pysrc/synthorg/security/risk_scorer.pysrc/synthorg/research/retrieval/__init__.pysrc/synthorg/engine/prompt_safety.pysrc/synthorg/observability/events/research.pysrc/synthorg/research/retrieval/sources/__init__.pysrc/synthorg/research/__init__.pysrc/synthorg/observability/events/persistence.pysrc/synthorg/persistence/protocol.pysrc/synthorg/meta/mcp/handlers/__init__.pysrc/synthorg/research/retrieval/sources/academic.pysrc/synthorg/persistence/sqlite/backend.pysrc/synthorg/settings/enums.pysrc/synthorg/persistence/sqlite/_backend_accessors.pysrc/synthorg/meta/mcp/domains/__init__.pysrc/synthorg/research/planning/__init__.pysrc/synthorg/research/constants.pysrc/synthorg/persistence/research_protocol.pysrc/synthorg/research/retrieval/protocol.pysrc/synthorg/security/action_types.pysrc/synthorg/observability/prometheus_labels.pysrc/synthorg/research/planning/protocol.pysrc/synthorg/research/retrieval/replay.pysrc/synthorg/research/retrieval/sources/_shared.pysrc/synthorg/persistence/postgres/backend.pysrc/synthorg/core/error_taxonomy.pysrc/synthorg/research/retrieval/sources/web.pysrc/synthorg/research/triage/heuristic.pysrc/synthorg/research/synthesis/protocol.pysrc/synthorg/security/timeout/risk_tier_classifier.pysrc/synthorg/research/triage/protocol.pysrc/synthorg/research/retrieval/sources/code.pysrc/synthorg/research/synthesis/citation_binder.pysrc/synthorg/meta/mcp/domains/_research_args.pysrc/synthorg/research/_args.pysrc/synthorg/research/config.pysrc/synthorg/research/errors.pysrc/synthorg/research/planning/llm_planner.pysrc/synthorg/meta/mcp/handlers/research.pysrc/synthorg/research/tool_factory.pysrc/synthorg/core/enums.pysrc/synthorg/research/retrieval/sources/knowledge.pysrc/synthorg/research/retrieval/providers.pysrc/synthorg/research/triage/hybrid.pysrc/synthorg/persistence/postgres/research_run_repo.pysrc/synthorg/security/rules/risk_classifier.pysrc/synthorg/research/factory.pysrc/synthorg/research/models.pysrc/synthorg/research/tool.pysrc/synthorg/research/synthesis/llm_synthesizer.pysrc/synthorg/research/triage/llm.pysrc/synthorg/api/state.pysrc/synthorg/persistence/sqlite/research_run_repo.pysrc/synthorg/research/_llm.pysrc/synthorg/research/retrieval/dedup.pysrc/synthorg/research/service.pysrc/synthorg/api/app.py
📚 Learning: 2026-05-17T11:45:11.839Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1952
File: src/synthorg/settings/definitions/api.py:594-638
Timestamp: 2026-05-17T11:45:11.839Z
Learning: In SynthOrg (Aureliolo/synthorg) pre-alpha, apply the strict no-backward-compat policy: any setting-key rename must be fully completed in the same change/PR with all repo callers updated, and you should not keep legacy aliases or compatibility fallbacks. When reviewing, do not flag a setting-key rename as a breaking upgrade hazard if the rename is repo-wide and fully implemented within the same PR.
Applied to files:
src/synthorg/settings/definitions/__init__.pysrc/synthorg/settings/definitions/research.pysrc/synthorg/settings/enums.py
📚 Learning: 2026-05-17T11:45:11.839Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1952
File: src/synthorg/settings/definitions/api.py:594-638
Timestamp: 2026-05-17T11:45:11.839Z
Learning: In this repository, SynthOrg is pre-alpha and uses a strict no-backward-compat policy for setting-key renames. When reviewing code under src/synthorg/settings, do NOT flag a setting-key rename as an “upgrade-safety” issue if the rename is complete/atomic in the same PR: all callers/usages of the old key are updated simultaneously, and the PR does not keep any legacy aliases, compatibility fallbacks, or migration/rollback paths for the old key.
Applied to files:
src/synthorg/settings/definitions/__init__.pysrc/synthorg/settings/definitions/research.pysrc/synthorg/settings/enums.py
📚 Learning: 2026-05-16T18:36:31.446Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1944
File: docs/reference/conventions.md:787-789
Timestamp: 2026-05-16T18:36:31.446Z
Learning: In Aureliolo/synthorg, do not require adding `<!--RS:...-->` “Doc Numeric Claims (MANDATORY)” numeric macros for Python version numbers mentioned in documentation prose (e.g., “Python 3.14”, “Python 3.15”). The `scripts/check_doc_numeric_macros.py` gate only applies to `README.md`, `docs/index.md`, `docs/roadmap/index.md`, `docs/architecture/decisions.md`, and `docs/reference/convention-gates.md`, and it only flags digits adjacent to specific stat nouns (tests/providers/agents/stars/releases), not language version mentions like “Python 3.14”.
Applied to files:
docs/design/research-mode.md
📚 Learning: 2026-05-16T18:36:35.250Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1944
File: docs/getting_started.md:109-109
Timestamp: 2026-05-16T18:36:35.250Z
Learning: When reviewing Markdown in the synthorg repo, account for the CI gate `check_doc_numeric_macros.py`: it skips fenced code blocks entirely, and it only flags digits that are adjacent to these stat nouns: `tests`, `providers`, `agents`, `stars`, `releases`. Therefore, numeric examples such as CLI flag values (e.g., `--num-workers=4` in fenced bash blocks) and prose version numbers (e.g., `3.14`/`3.15`) are not expected to trigger this check; prioritize changes only when digits appear next to one of the listed nouns (e.g., “5 tests”, “10 stars”, etc.).
Applied to files:
docs/design/research-mode.md
📚 Learning: 2026-05-16T18:36:35.250Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1944
File: docs/getting_started.md:109-109
Timestamp: 2026-05-16T18:36:35.250Z
Learning: When reviewing markdown files for the "Doc Numeric Claims (MANDATORY)" RS-marker rule, only require/flag missing RS markers in the files that are actually in-scope for the rule. The scope is enforced via an identical _SCOPED_FILES allowlist in scripts/check_doc_numeric_macros.py and scripts/inject_runtime_stats.py, and currently includes: README.md; docs/index.md; docs/roadmap/index.md; docs/architecture/decisions.md; docs/reference/convention-gates.md. For any other markdown files (e.g., docs/getting_started.md, docs/guides/*), missing RS markers for numeric claims are no-ops and should NOT be flagged.
Applied to files:
docs/design/research-mode.md
📚 Learning: 2026-05-16T18:36:35.250Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1944
File: docs/getting_started.md:109-109
Timestamp: 2026-05-16T18:36:35.250Z
Learning: When reviewing Markdown in the synthorg repo against the `check_doc_numeric_macros.py` gate, account for its documented behavior: it skips fenced code blocks entirely, and it only flags digits that are adjacent to specific stat nouns (`tests`, `providers`, `agents`, `stars`, `releases`). As a result, CLI-style numbers (e.g., `--num-workers=4`) inside fenced bash code blocks should never be treated as violations of this gate; only non-fenced text needs checking, and only around those specific nouns.
Applied to files:
docs/design/research-mode.md
Reviewer fixes (CodeRabbit + Gemini):
- research/service.py: enforce max_cost/max_wall_clock_seconds budgets
(asyncio.timeout plus per-stage cost checks) via new
ResearchBudgetExceededError; surface MemoryError/RecursionError out of
TaskGroup BaseExceptionGroup
- _llm.py: balanced JSON-object extraction via JSONDecoder.raw_decode scan
- models.py: FAILED runs now require completed_at as well as error
- retrieval/dedup.py: canonicalise only host case, preserve path case
- retrieval/sources/{academic,code}.py: skip blank-URI rows, not whole call
- synthesis/llm_synthesizer.py: wrap model-produced research_angle untrusted
- tool.py: repr(min_credibility) in run_id key to avoid lossy collapse
- triage/llm.py: validate batch_size at least one
- settings/definitions/research.py: mark master 'enabled' restart_required
- mcp/domains/research.py: derive _STATUSES from ResearchRunStatus enum
- mcp/handlers/research.py: created_at via app_state Clock seam
- api/app.py: wire research from live AppState settings, not captured arg
- evals/scoring/research.py: Unicode-aware tokeniser
- tests: real ReplayRetrievalSource assertions; protocol-shaped research_runs fake
CI fixes:
- web settings Records and NAMESPACE_ORDER gain 'research' (TS build/lighthouse)
- tool-count pin 216 becomes 219 (research adds run/get/list)
- starlette 1.0.0 to 1.0.1 (PYSEC-2026-161)
- regenerated error-codes/openapi types for RESEARCH_BUDGET_EXCEEDED
Skipped: CR positional_relevance one-based fix (callers pass zero-based
enumerate; current code already scores first result 1.0).
Pre-push mypy caught dict-invariance: build_replay_sources returns ReplayRetrievalSource values which don't fit _build_service's dict[ResearchSourceType, RetrievalSource] parameter. Add the wider annotation explicitly, matching what test_research_eval.py already does.
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #2041 +/- ##
==========================================
+ Coverage 84.98% 85.03% +0.04%
==========================================
Files 2157 2193 +36
Lines 126065 127351 +1286
Branches 10530 10579 +49
==========================================
+ Hits 107142 108298 +1156
- Misses 16281 16391 +110
- Partials 2642 2662 +20 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
CodeRabbit's CHANGES_REQUESTED was for commit f491a9c; all 18 actionable findings have been addressed in 158c839 (one factually-wrong positional_relevance skip logged with disproof). CodeRabbit's rolling summary on the new head 3ffd6c8 confirms 'No actionable comments were generated in the recent review.' Dismissing the stale review.
<!-- HIGHLIGHTS_START --> ## Highlights > _AI-generated summary (model: `openai/gpt-4.1-mini` via GitHub Models). Commit-based changelog below._ ### What you'll notice - New brownfield codebase intake mode supports merger and acquisition scenarios. - Added deep CEO interview feature to improve project charter creation. - Introduced mission control and flight recorder operator cockpit for better operational oversight. - Research mode added for enhanced exploratory work. - Runtime services now log safety-spine state at boot for clearer diagnostics. ### What's new - Research mode feature enables deeper data exploration. - CEO interview integration helps shape project charters. - Mission control and flight recorder cockpit introduced for operational tracking. ### Under the hood - Improved codebase modularity with module-size gates and lint tightening. - Added __init__.py to 21 test directories for better test discovery. - Promoted six transitive dependencies to direct dependencies for clarity. - Split codespell ignore list into vocabulary and source renames. - Decomposed oversized web utilities, hooks, and libraries for maintainability. - Enhanced CI with Lychee link checker integration and retry logic for cosign signing. - Sharded unit and integration tests and added Postgres service container in CI. - Updated infrastructure and web dependencies; maintained lock files. <!-- HIGHLIGHTS_END --> :robot: I have created a release *beep* *boop* --- ## [0.8.8](v0.8.7...v0.8.8) (2026-05-24) ### Features * brownfield codebase intake (merger/acquisition entry mode) ([#2042](#2042)) ([e287621](e287621)), closes [#1975](#1975) * deep CEO interview to project charter ([#2045](#2045)) ([904f2fb](904f2fb)) * mission control + flight recorder operator cockpit ([#2044](#2044)) ([1c2660b](1c2660b)) * research mode ([#2041](#2041)) ([f81a5ac](f81a5ac)), closes [#1989](#1989) * surface safety-spine state in runtime-services boot log (closes [#2096](#2096)) ([#2097](#2097)) ([f187b31](f187b31)) ### Refactoring * add __init__.py to 21 leaf test directories (INP001) ([#2081](#2081)) ([2592118](2592118)), closes [#2064](#2064) * codebase modularity (1/4) - module-size gates + lint tightening + tools ([#2078](#2078)) ([556fbd9](556fbd9)), closes [#2047](#2047) [#2040](#2040) * promote 6 transitive deps to direct deps ([#2083](#2083)) ([adedc6a](adedc6a)) * split codespell ignore-words-list into vocab + source renames ([#2085](#2085)) ([917d98a](917d98a)), closes [#2074](#2074) * **web:** PR A foundation, decompose oversized utils/hooks/lib ([#2092](#2092)) ([#2098](#2098)) ([aedbba5](aedbba5)) ### CI/CD * exclude slsa.dev from lychee (transient timeout on canonical badge) ([#2090](#2090)) ([346c51d](346c51d)) * fix paths-filter shallow-clone race and scorecard allowlist ([#2089](#2089)) ([7cd7ce8](7cd7ce8)) * refresh .test_durations.{unit,integration} ([#2087](#2087)) ([ddf2d86](ddf2d86)) * retry cosign sign on transient GHCR/Rekor failures ([#2100](#2100)) ([da9422a](da9422a)) * shard test-unit + test-integration, sysmon coverage, Postgres service container ([#2080](#2080)) ([0768787](0768787)) * wire Lychee link-checker (workflow + installer + pre-push hook) ([#2084](#2084)) ([1c0694a](1c0694a)) ### Maintenance * Lock file maintenance ([#2086](#2086)) ([a78810a](a78810a)) * Update Infrastructure dependencies ([#2055](#2055)) ([041ad8b](041ad8b)) * Update Web dependencies ([#2054](#2054)) ([4d57b9a](4d57b9a)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). --------- Co-authored-by: synthorg-repo-bot[bot] <279117679+synthorg-repo-bot[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Summary
Implements research mode (#1989): a real research subsystem for synthetic organisations. A research brief drives query planning -> multi-source retrieval (internal knowledge substrate + vendor-agnostic web / academic / code search) -> source-credibility triage -> deduplication -> citation-backed synthesis. Every run is recorded and deterministically replayable, and every claim in the deliverable resolves to a retrievable source.
New package
src/synthorg/research/mirrors the knowledge-substrate conventions. Built on the CLOSED dependencies: the knowledge+provenance substrate (#2036) and governed external access (#1991).What's included
QueryPlanner(LlmQueryPlanner) decomposes the brief into source-targeted sub-queries.RetrievalSourcex4 fanned out viaasyncio.TaskGroup: knowledge (wrapsKnowledgeService) + web / academic / code (vendor-agnostic provider protocols, no bundled impl, injected at runtime). Per-source failure is isolated.CredibilityTriage(HybridCredibilityTriagedefault: deterministic heuristic prefilter then LLM triage on survivors; pure heuristic / pure LLM also shipped).Deduplicator(LexicalDeduplicatordefault: content-hash + canonical-URL + token-shingle Jaccard;EmbeddingDeduplicatoralternative).Synthesizer(LlmSynthesizer+CitationBinder): every claim cites sources by ref-id; an unsourced claim raisesResearchSynthesisError.ReplayRetrievalSourceserves recorded items, so a run is byte-identical on replay.research_runstable (JSON blob + denormalised filter columns),ResearchRunRepositoryover the generic categories, SQLite + Postgres impls, yoyo revisions per backend, dual-backend conformance test.ResearchToolagent tool + MCP domainresearch:run/research:get/research:list; boot wiring via_wire_research_enginebehindresearch.enabled+ a configured provider/model; ghost-wiring manifest entries (ENFORCED).kind="research"brief +ResearchBriefSpec, deterministicgrade_research_rungrader (claim coverage, citation resolution, source credibility), and a spine test that records -> replays -> grades. Plusdocs/design/research-mode.md.wrap_untrusted()where they enter the planning / triage / synthesis prompts.Test plan
tests/unit/research/**(models, planning, retrieval, triage, synthesis, service, tool),tests/unit/meta/mcp/test_research_handlers.py,tests/conformance/persistence/test_research_run_repository.py(dual-backend),tests/evals_spine/test_research_eval.py(record + byte-identical replay + grade).Review coverage
Pre-reviewed by 7 core agents (code-reviewer, python-reviewer, conventions-enforcer, security-reviewer, persistence-reviewer, test-quality-reviewer, issue-resolution-verifier). 5 findings addressed (SEC-1 prompt wrapping + test-assertion strengthening + a docstring); 7 rejected with reasons recorded in
_audit/pre-pr-review/triage.md. issue-resolution-verifier returned PASS on all acceptance criteria.Closes #1989