diff --git a/docs/design/coordination.md b/docs/design/coordination.md index 81c84d1482..3a5308431c 100644 --- a/docs/design/coordination.md +++ b/docs/design/coordination.md @@ -394,6 +394,40 @@ worktrees directly; signals the `WorkspaceManager` to act. **Module**: `src/synthorg/engine/workspace/disk_quota.py` +### Persistent Per-Project Workspace and Push Queue Serialisation + +Each project gets a 1:1 persistent git-backed working tree on the +runtime volume. `ProjectWorkspaceService.get_or_provision(project_id)` +materialises the working tree under +`/projects//` on first touch via the configured +`GitBackend` (`embedded` default; `local_path` / `external_remote` are +opt-in via config). The tree survives across agents, tasks, and +sessions. `GitBackendConfig.kind` is authoritative: a persisted row +whose kind differs from the live config triggers a re-provision under +the new backend and a `WORKSPACE_BACKEND_KIND_CHANGED` log event. + +When N agents finish concurrently on one project, their +merge-to-default-branch + push-to-backend operations route through a +per-project FIFO serial queue (`PushQueueCoordinator`) so concurrent +pushes never collide at the git backend. The queue sits in front of +the `WorkspaceIsolationStrategy` seam, so a future virtual-branch +strategy supplies its own `merge_workspace` without changing the +queue. A conflicted merge resolves the caller future without pushing +(the queue refuses to push a broken default branch). `stop()` drains +in flight then exits cleanly; `WorkspacePushError` distinguishes a +forge-rejection push failure from a local `WorkspaceMergeError`. + +Events emitted: `PROJECT_WORKSPACE_PROVISIONED`, +`PROJECT_WORKSPACE_REUSED`, `WORKSPACE_BACKEND_KIND_CHANGED`, +`WORKSPACE_PUSH_QUEUE_ENQUEUED`, `WORKSPACE_PUSH_QUEUE_MERGED`, +`WORKSPACE_PUSH_QUEUE_FAILED`, `WORKSPACE_PUSH_QUEUE_WORKER_FAILED`. + +**Modules**: +- `src/synthorg/engine/workspace/project_workspace_service.py` +- `src/synthorg/engine/workspace/git_backend/` (protocol + 3 strategies + factory) +- `src/synthorg/engine/workspace/push_queue.py` +- `src/synthorg/engine/workspace/service.py` (per-project queue cache + `merge_workspace_with_push`) + ## Task Decomposability & Coordination Topology Empirical research on agent scaling diff --git a/docs/design/integrations.md b/docs/design/integrations.md index 56f664b364..26b0270558 100644 --- a/docs/design/integrations.md +++ b/docs/design/integrations.md @@ -31,6 +31,9 @@ and optional rate limiting and health check configuration. | Type | Auth Fields | Health Check | |------|------------|--------------| | `github` | `token`, `api_url` | `GET /user` | +| `gitlab` | `token`, `api_url` | `GET /user` | +| `gitea` | `token`, `api_url` | `GET /api/v1/user` | +| `forgejo` | `token`, `api_url` | `GET /api/v1/user` | | `slack` | `token`, `signing_secret` | `POST auth.test` | | `smtp` | `host`, `port`, `username`, `password` | SMTP EHLO | | `database` | `dialect`, `host`, `port`, `username`, `password`, `database` | `SELECT 1` | diff --git a/docs/reference/errors.md b/docs/reference/errors.md index a7fb64c296..96a348ee3d 100644 --- a/docs/reference/errors.md +++ b/docs/reference/errors.md @@ -77,6 +77,7 @@ The NotFound hierarchy is driven by a single `NotFoundError` class with domain-s | 3014 | `AB_TEST_NOT_FOUND` | A/B test record for a proposal | | 3015 | `BACKUP_NOT_FOUND` | Backup archive | | 3016 | `MEMORY_ENTRY_NOT_FOUND` | Agent memory entry | +| 3017 | `CONVERSATION_NOT_FOUND` | Conversation record | All share the same `type` URI; the numeric code is the discriminator. @@ -98,6 +99,9 @@ All share the same `type` URI; the numeric code is the discriminator. | 4011 | `FINE_TUNE_RUN_ACTIVE` | A fine-tune run is already active (start/resume blocked) | | 4012 | `TRAINING_PLAN_NOT_MODIFIABLE` | Training plan cannot be modified after execution or failure | | 4013 | `BACKUP_UNRESTARTABLE` | Backup service stopped in an unrestartable state | +| 4014 | `AGENT_RUNTIME_NOT_CONFIGURED` | No LLM provider configured; agent runtime cannot execute | +| 4015 | `CONVERSATION_CLOSED` | Conversation is closed; no further messages or actions accepted | +| 4016 | `PROJECT_WORKSPACE_NOT_PROVISIONED` | Project workspace required but never provisioned by the git backend | ## Rate Limit (5xxx) @@ -131,6 +135,7 @@ All share the same `type` URI; the numeric code is the discriminator. | 7007 | `INTEGRATION_ERROR` | Non-LLM integration failure | | 7008 | `OAUTH_ERROR` | OAuth exchange failed | | 7009 | `WEBHOOK_ERROR` | Webhook receive/replay failure | +| 7010 | `CONVERSATIONAL_PROPOSE_RESPONSE_INVALID` | Chief-of-Staff proposer returned an invalid response | ## Internal (8xxx) diff --git a/docs/reference/pluggable-subsystems.md b/docs/reference/pluggable-subsystems.md index 9a9a221de7..0a765dcb2c 100644 --- a/docs/reference/pluggable-subsystems.md +++ b/docs/reference/pluggable-subsystems.md @@ -167,6 +167,15 @@ Domain errors live at `meta/errors.py::RollbackMutationDeniedError` (409) and `U - `backup/registry.py::PERSISTENCE_BACKUP_HANDLER_REGISTRY`: `StrategyRegistry` keyed on `config.persistence.backend` ("sqlite" / "postgres"). - `backup/factory.py::build_backup_handlers()`: dispatches per `BackupComponent` and uses the registry for the persistence handler. +### Git backend storage strategy + +- `engine/workspace/git_backend/protocol.py`: `GitBackend` `@runtime_checkable` Protocol, with `ProvisionResult`/`PushResult`/`FetchResult` frozen result models. +- `engine/workspace/git_backend/config.py`: `GitBackendConfig` (frozen) with `kind: GitBackendType` discriminator and `GitBackendDeps` (collaborators not safe in frozen config: `workspace_base_root`, `connection_catalog`, `secret_backend`, `clock`). +- `engine/workspace/git_backend/embedded.py::EmbeddedGitBackend` (safe default: bare repo self-hosted on the persistent volume, no external dependency). +- `engine/workspace/git_backend/local_path.py::LocalPathGitBackend` (bring-your-own on-disk git repository, push/fetch are no-ops because the on-disk repo is the durable store). +- `engine/workspace/git_backend/external_remote.py::ExternalRemoteGitBackend` (GitHub / GitLab / Gitea / Forgejo resolved via the connection catalog; ships protocol + thin clone/push/fetch glue; deep OAuth hardening is a tracked follow-up). +- `engine/workspace/git_backend/factory.py::build_git_backend()`: `StrategyRegistry[GitBackend]` keyed on `GitBackendType`. Missing required deps fail fast at construction with `GitBackendConfigError`. Wired at boot in `api/app.py::_install_runtime_services` under the `has_persistence` gate, alongside `ProjectWorkspaceService`. + ## Services are a distinct pattern (not pluggable subsystems) A **service** wraps one or more repositories to keep controllers thin and centralise audit logging, and MAY orchestrate multiple repositories (e.g. `WorkflowService` spans `workflow_definitions` + `workflow_versions`; `MemoryService` spans fine-tune checkpoints + runs + settings). diff --git a/scripts/_ghost_wiring_manifest.txt b/scripts/_ghost_wiring_manifest.txt index 3d953a0da3..be4876b0aa 100644 --- a/scripts/_ghost_wiring_manifest.txt +++ b/scripts/_ghost_wiring_manifest.txt @@ -45,3 +45,9 @@ ENFORCED build_work_pipeline #1960 -- called by workers.runtime_builder._build_r ENFORCED build_chief_of_staff_proposer #1968 -- called by api.app._wire_chief_of_staff_proposer behind propose_enabled + provider switch; constructs ChiefOfStaffProposer which parks approval-gated WorkItems for the conversational interface ENFORCED TaskBoardEntryAdapter #1963 -- constructed by engine.pipeline.entry.factory.build_work_entry_adapter on the TASK_BOARD arm; wired at boot by engine.pipeline.entry.boot.wire_real_task_board_entry; drives the spine for human-filed board tasks (POST /tasks) ENFORCED ObjectiveEntryAdapter #1964 -- built at boot by engine.pipeline.entry.boot.wire_real_objective_entry; fed by POST /objectives +ENFORCED ProjectWorkspaceService #1974 -- constructed in api/app.py _install_runtime_services; per-project persistent git-backed workspace provisioning +ENFORCED build_git_backend #1974 -- called in api/app.py _install_runtime_services; builds the configured GitBackend strategy (embedded default) +ENFORCED EmbeddedGitBackend #1974 -- default GitBackend strategy built by engine/workspace/git_backend/factory.py::_build_embedded +ENFORCED LocalPathGitBackend #1974 -- GitBackend strategy built by engine/workspace/git_backend/factory.py::_build_local_path for the LOCAL_PATH discriminator +ENFORCED ExternalRemoteGitBackend #1974 -- GitBackend strategy built by engine/workspace/git_backend/factory.py::_build_external_remote for the EXTERNAL_REMOTE discriminator +ENFORCED PushQueueCoordinator #1974 -- constructed per-project by engine/workspace/service.py::WorkspaceIsolationService._get_or_create_queue; serialises merge+push to the backend diff --git a/scripts/schema_drift_baseline.txt b/scripts/schema_drift_baseline.txt index 9f6a991c82..dfd2f1a986 100644 --- a/scripts/schema_drift_baseline.txt +++ b/scripts/schema_drift_baseline.txt @@ -109,6 +109,8 @@ column:preset_overrides:updated_at:TEXT:TIMESTAMPTZ:SQLite has no TIMESTAMPTZ; c column:principle_overrides:created_at:TEXT:TIMESTAMPTZ:SQLite has no TIMESTAMPTZ; column stores TEXT carrying ISO-8601 with explicit +00:00/Z suffix, normalised to UTC at write time column:principle_overrides:updated_at:TEXT:TIMESTAMPTZ:SQLite has no TIMESTAMPTZ; column stores TEXT carrying ISO-8601 with explicit +00:00/Z suffix, normalised to UTC at write time column:project_cost_aggregates:last_updated:TEXT:TIMESTAMPTZ:SQLite has no TIMESTAMPTZ; column stores TEXT carrying ISO-8601 with explicit +00:00/Z suffix, normalised to UTC at write time +column:project_workspaces:created_at:TEXT:TIMESTAMPTZ:SQLite has no TIMESTAMPTZ; column stores TEXT carrying ISO-8601 with explicit +00:00/Z suffix, normalised to UTC at write time +column:project_workspaces:updated_at:TEXT:TIMESTAMPTZ:SQLite has no TIMESTAMPTZ; column stores TEXT carrying ISO-8601 with explicit +00:00/Z suffix, normalised to UTC at write time column:projects:deadline:TEXT:TIMESTAMPTZ:SQLite has no TIMESTAMPTZ; column stores TEXT carrying ISO-8601 with explicit +00:00/Z suffix, normalised to UTC at write time column:projects:task_ids:TEXT:JSONB:SQLite has no JSONB type; column stores TEXT carrying json.dumps(...) (see schema header column inventory) column:projects:team:TEXT:JSONB:SQLite has no JSONB type; column stores TEXT carrying json.dumps(...) (see schema header column inventory) diff --git a/src/synthorg/api/app.py b/src/synthorg/api/app.py index 770a11d628..c5f3655df6 100644 --- a/src/synthorg/api/app.py +++ b/src/synthorg/api/app.py @@ -1052,6 +1052,47 @@ async def _install_runtime_services() -> None: if env_workspace_root is not None: app_state.set_agent_workspace_root(env_workspace_root) + # Per-project persistent workspace substrate. The git backend is + # config-selected (embedded default, no external dep); + # ProjectWorkspaceService provisions one persistent git-backed + # tree per project under the workspace base. Persistence-less + # boots (test fixtures, dev apps with no DB) skip wiring -- the + # service is optional and gates on ``has_project_workspace_service``. + if app_state.has_persistence and app_state.project_workspace_service is None: + # Guard against partial-startup retry: this hook fires once + # the persistence layer is connected, but ``build_runtime_services`` + # below is fallible and a re-entry after its failure would + # otherwise hit the ``_set_once`` guard inside + # ``set_project_workspace_service`` and fail with + # "already configured" instead of cleanly retrying the + # runtime-services build. + from synthorg.engine.workspace.git_backend import ( # noqa: PLC0415 + GitBackendConfig, + GitBackendDeps, + build_git_backend, + ) + from synthorg.engine.workspace.project_workspace_service import ( # noqa: PLC0415 + ProjectWorkspaceService, + ) + + git_backend_config = GitBackendConfig() + git_backend = build_git_backend( + git_backend_config, + GitBackendDeps( + workspace_base_root=app_state.agent_workspace_root, + clock=app_state.clock, + ), + ) + app_state.set_project_workspace_service( + ProjectWorkspaceService( + base_root=app_state.agent_workspace_root, + repo=app_state.persistence.project_workspaces, + git_backend=git_backend, + config=git_backend_config, + clock=app_state.clock, + ), + ) + try: services = await build_runtime_services( app_state, diff --git a/src/synthorg/api/state.py b/src/synthorg/api/state.py index b784e0dc5d..f387ec48cf 100644 --- a/src/synthorg/api/state.py +++ b/src/synthorg/api/state.py @@ -148,6 +148,9 @@ WorkflowRollbackService, ) from synthorg.engine.workflow.webhook_bridge import WebhookEventBridge + from synthorg.engine.workspace.project_workspace_service import ( + ProjectWorkspaceService, + ) from synthorg.integrations.connections.catalog import ConnectionCatalog from synthorg.integrations.health.prober import HealthProberService from synthorg.integrations.mcp_catalog.installations import ( @@ -270,6 +273,7 @@ class AppState(AppStateServicesMixin): "_personality_service", "_preset_override_service", "_project_facade_service", + "_project_workspace_service", "_prometheus_collector", "_provider_audit_service", "_provider_health_tracker", @@ -463,6 +467,10 @@ def __init__( # noqa: PLR0913, PLR0915 # process-stable temp directory so dev / empty-company runs # still have a valid absolute workspace. self._agent_workspace_root: Path | None = None + # Per-project persistent workspace provisioning service. Wired + # at boot behind the provider switch; ``None`` for empty-company + # / dev apps with no runtime services installed. + self._project_workspace_service: ProjectWorkspaceService | None = None # Guards the double-checked locking on first-access lazy wiring # of worker_execution_service / experiment_service. Both # properties may be invoked from concurrent request handlers @@ -978,6 +986,26 @@ def set_agent_workspace_root(self, path: Path) -> None: "Agent workspace root", ) + @property + def project_workspace_service(self) -> ProjectWorkspaceService | None: + """Per-project persistent workspace provisioner, or ``None``. + + Wired at boot behind the provider-present switch; ``None`` for + empty-company / dev apps where no runtime services are installed. + """ + return self._project_workspace_service + + def set_project_workspace_service( + self, + service: ProjectWorkspaceService, + ) -> None: + """Attach the project workspace service (once-only, startup).""" + self._set_once( + "_project_workspace_service", + service, + "Project workspace service", + ) + @property def experiment_service(self) -> ExperimentService: """Return the A/B experiment service, auto-wiring the default. diff --git a/src/synthorg/core/enums.py b/src/synthorg/core/enums.py index 070e780a52..d5075435b0 100644 --- a/src/synthorg/core/enums.py +++ b/src/synthorg/core/enums.py @@ -398,6 +398,21 @@ class ProjectStatus(StrEnum): CANCELLED = "cancelled" +class GitBackendType(StrEnum): + """Discriminator selecting how a project's git repository is stored. + + ``EMBEDDED`` is the safe default: the product self-hosts a bare repo + on the persistent volume, with no external dependency. ``LOCAL_PATH`` + targets a caller-supplied repository on disk. ``EXTERNAL_REMOTE`` + delegates to an external forge remote resolved via the connection + catalogue. + """ + + EMBEDDED = "embedded" + EXTERNAL_REMOTE = "external_remote" + LOCAL_PATH = "local_path" + + class ToolAccessLevel(StrEnum): """Access level for tool permissions. diff --git a/src/synthorg/core/error_taxonomy.py b/src/synthorg/core/error_taxonomy.py index f7223ca9fc..08f8f4db11 100644 --- a/src/synthorg/core/error_taxonomy.py +++ b/src/synthorg/core/error_taxonomy.py @@ -109,6 +109,7 @@ class ErrorCode(IntEnum): BACKUP_UNRESTARTABLE = 4013 AGENT_RUNTIME_NOT_CONFIGURED = 4014 CONVERSATION_CLOSED = 4015 + PROJECT_WORKSPACE_NOT_PROVISIONED = 4016 # 5xxx -- rate_limit RATE_LIMITED = 5000 diff --git a/src/synthorg/core/project_workspace.py b/src/synthorg/core/project_workspace.py new file mode 100644 index 0000000000..dadd4640e4 --- /dev/null +++ b/src/synthorg/core/project_workspace.py @@ -0,0 +1,59 @@ +"""Persistent per-project workspace domain model. + +A :class:`ProjectWorkspace` is the 1:1 mapping between a +:class:`~synthorg.core.project.Project` and the persistent, git-backed +working tree that survives across agents, tasks and sessions. The row +records where the workspace lives on the persistent volume and which git +backend provisioned it, so a session restart re-locates the same +directory without re-deriving environment precedence and a configured +backend switch can be detected against the persisted kind. +""" + +from typing import Final + +from pydantic import AwareDatetime, BaseModel, ConfigDict, Field + +from synthorg.core.enums import GitBackendType # noqa: TC001 +from synthorg.core.types import NotBlankStr + +_DEFAULT_BRANCH: Final[str] = "main" + + +class ProjectWorkspace(BaseModel): + """Persistent git-backed workspace bound to a single project. + + Attributes: + project_id: Owning project identifier (primary key, 1:1 with + ``Project.id``). + workspace_path: Absolute on-volume path of the project working + tree (``/projects/``). Persisted so a + restart re-locates the same directory deterministically. + git_backend_kind: Which backend provisioned the repository; lets + a config switch detect a mismatch against the live config. + remote_ref: External remote URL or connection-catalog name for + the ``EXTERNAL_REMOTE`` backend; ``None`` for embedded/local. + default_branch: Default branch the backend provisions and the + merge/push queue targets. + created_at: Provisioning timestamp (timezone-aware, UTC). + updated_at: Last mutation timestamp (timezone-aware, UTC). + """ + + model_config = ConfigDict(frozen=True, allow_inf_nan=False, extra="forbid") + + project_id: NotBlankStr = Field(description="Owning project identifier (PK)") + workspace_path: NotBlankStr = Field( + description="Absolute on-volume path of the project working tree", + ) + git_backend_kind: GitBackendType = Field( + description="Backend that provisioned this workspace", + ) + remote_ref: NotBlankStr | None = Field( + default=None, + description="External remote URL / connection name (external backend)", + ) + default_branch: NotBlankStr = Field( + default=NotBlankStr(_DEFAULT_BRANCH), + description="Default branch the backend provisions", + ) + created_at: AwareDatetime = Field(description="Provisioning timestamp (UTC)") + updated_at: AwareDatetime = Field(description="Last mutation timestamp (UTC)") diff --git a/src/synthorg/engine/coordination/factory.py b/src/synthorg/engine/coordination/factory.py index 6b29df1ced..308b610e97 100644 --- a/src/synthorg/engine/coordination/factory.py +++ b/src/synthorg/engine/coordination/factory.py @@ -41,6 +41,7 @@ from synthorg.engine.shutdown import ShutdownManager from synthorg.engine.task_engine import TaskEngine from synthorg.engine.workspace.config import WorkspaceIsolationConfig + from synthorg.engine.workspace.git_backend import GitBackend from synthorg.engine.workspace.protocol import WorkspaceIsolationStrategy from synthorg.engine.workspace.service import WorkspaceIsolationService from synthorg.hr.performance.tracker import PerformanceTracker @@ -118,12 +119,18 @@ def _build_decomposition_strategy( def _build_workspace_service( workspace_strategy: WorkspaceIsolationStrategy | None, workspace_config: WorkspaceIsolationConfig | None, + git_backend: GitBackend | None = None, ) -> WorkspaceIsolationService | None: """Build workspace isolation service if both deps are provided. Raises: ValueError: If exactly one of *workspace_strategy* / - *workspace_config* is supplied -- both or neither must be given. + *workspace_config* is supplied -- both or neither must be + given. Also raised when *git_backend* is supplied without + *workspace_strategy* AND *workspace_config*: routing pushes + through the coordinator-owned push queue is only possible + when a workspace service exists, so accepting the backend + silently disables it (the queueing feature ships dead). """ if workspace_strategy is not None and workspace_config is not None: from synthorg.engine.workspace.service import ( # noqa: PLC0415 @@ -133,6 +140,7 @@ def _build_workspace_service( return WorkspaceIsolationService( strategy=workspace_strategy, config=workspace_config, + git_backend=git_backend, ) if (workspace_strategy is None) != (workspace_config is None): given = ( @@ -156,6 +164,22 @@ def _build_workspace_service( missing=missing, ) raise ValueError(msg) + if git_backend is not None: + # Neither workspace dep is set, but a git backend was supplied: + # the push-queue routing feature can only fire through a + # workspace service, so silently accepting the backend would + # ship the queueing path disabled while pretending it works. + msg = ( + "git_backend was supplied without workspace_strategy and " + "workspace_config; routing pushes through the coordinator " + "requires a workspace service. Provide both workspace deps " + "or omit git_backend." + ) + logger.warning( + COORDINATION_FACTORY_BUILT, + note="git_backend without workspace deps", + ) + raise ValueError(msg) return None @@ -169,6 +193,7 @@ def build_coordinator( # noqa: PLR0913 task_engine: TaskEngine | None = None, workspace_strategy: WorkspaceIsolationStrategy | None = None, workspace_config: WorkspaceIsolationConfig | None = None, + git_backend: GitBackend | None = None, shutdown_manager: ShutdownManager | None = None, performance_tracker: PerformanceTracker | None = None, routing_scorer_config: RoutingScorerConfig | None = None, @@ -203,6 +228,10 @@ def build_coordinator( # noqa: PLR0913 task_engine: Optional task engine for parent status updates. workspace_strategy: Optional workspace isolation strategy. workspace_config: Optional workspace isolation config. + git_backend: Optional pluggable git backend; when provided, the + workspace service routes per-project merge+push through the + serial :class:`PushQueueCoordinator` for forge-collision + safety. ``None`` keeps the legacy in-process merge path. shutdown_manager: Optional shutdown manager for the executor. performance_tracker: Optional tracker for recording per-agent coordination contributions. @@ -255,7 +284,7 @@ def _topology_provider() -> CoordinationTopology: routing_service=routing_service, parallel_executor=parallel_executor, workspace_service=_build_workspace_service( - workspace_strategy, workspace_config + workspace_strategy, workspace_config, git_backend ), task_engine=task_engine, performance_tracker=performance_tracker, diff --git a/src/synthorg/engine/errors.py b/src/synthorg/engine/errors.py index 899d2c5830..c6e76c4cad 100644 --- a/src/synthorg/engine/errors.py +++ b/src/synthorg/engine/errors.py @@ -166,6 +166,81 @@ class WorkspaceLimitError(WorkspaceError): """Raised when maximum concurrent workspaces reached.""" +class WorkspacePushError(WorkspaceError): + """Raised when the coordinator-owned push to the git backend fails. + + Distinct from :class:`WorkspaceMergeError` (local git merge state) + so callers can tell a forge/remote push rejection apart from a + local textual merge conflict. + """ + + retryable: ClassVar[bool] = True + default_message: ClassVar[str] = "Failed to push workspace to git backend" + + +class ProjectWorkspaceError(EngineError): + """Base exception for persistent project-workspace failures.""" + + +class ProjectWorkspaceNotProvisionedError(ProjectWorkspaceError): + """Raised when a project workspace is required but not yet provisioned. + + The message is deliberately generic to avoid leaking internal + identifiers. The ``project_id`` attribute is available for + structured logs but must NOT be included in user-facing responses. + + Attributes: + project_id: The project that has no provisioned workspace. + """ + + status_code: ClassVar[int] = 409 + error_code: ClassVar[ErrorCode] = ErrorCode.PROJECT_WORKSPACE_NOT_PROVISIONED + error_category: ClassVar[ErrorCategory] = ErrorCategory.CONFLICT + default_message: ClassVar[str] = "Project workspace not provisioned" + + def __init__(self, *, project_id: NotBlankStr) -> None: + super().__init__("Project workspace not provisioned") + self.project_id: NotBlankStr = project_id + + +class GitBackendError(EngineError): + """Base exception for pluggable git-backend failures.""" + + +class GitBackendConfigError(GitBackendError): + """Raised when git-backend configuration is invalid for the strategy. + + Fail-fast at factory construction (e.g. ``LOCAL_PATH`` selected but + no ``local_repo_path``, or ``EXTERNAL_REMOTE`` without its connection + catalog / secret-backend dependency). + """ + + status_code: ClassVar[int] = 422 + error_code: ClassVar[ErrorCode] = ErrorCode.REQUEST_VALIDATION_ERROR + error_category: ClassVar[ErrorCategory] = ErrorCategory.VALIDATION + default_message: ClassVar[str] = "Git backend configuration invalid" + + +class GitBackendProvisionError(GitBackendError): + """Raised when the git backend fails to provision a repository.""" + + default_message: ClassVar[str] = "Repository provisioning failed" + + +class GitBackendPushError(GitBackendError): + """Raised when the git backend fails to push a branch.""" + + retryable: ClassVar[bool] = True + default_message: ClassVar[str] = "Git push failed" + + +class GitBackendFetchError(GitBackendError): + """Raised when the git backend fails to fetch from the remote.""" + + retryable: ClassVar[bool] = True + default_message: ClassVar[str] = "Git fetch failed" + + class TaskEngineError(EngineError): """Base exception for all task engine errors.""" diff --git a/src/synthorg/engine/workspace/_git_subprocess.py b/src/synthorg/engine/workspace/_git_subprocess.py index e3d9d85a19..ecc40c53b2 100644 --- a/src/synthorg/engine/workspace/_git_subprocess.py +++ b/src/synthorg/engine/workspace/_git_subprocess.py @@ -9,18 +9,59 @@ """ import asyncio +import os # ``Path`` is imported at runtime (not under TYPE_CHECKING) because it is used # in a runtime-evaluated annotation on ``run_git_subprocess``; under PEP 649 # lazy annotations ``inspect.get_annotations`` resolves these in module globals, # so a TYPE_CHECKING-only import would raise ``NameError`` at introspection time. from pathlib import Path # noqa: TC003 +from urllib.parse import urlsplit, urlunsplit from synthorg.observability import get_logger logger = get_logger(__name__) +def _sanitised_env() -> dict[str, str]: + """Return ``os.environ`` minus git's discovery-override vars. + + When this code runs from inside a git pre-push hook (or any caller + whose own cwd is a git working tree), git inherits ``GIT_DIR`` / + ``GIT_WORK_TREE`` / ``GIT_COMMON_DIR`` from the parent process and + those override ordinary path-based repo discovery. A child ``git + rev-parse --is-inside-work-tree`` then reports the PARENT repo even + though we passed a fresh tmp-dir as ``cwd``. Strip them so our + git subprocesses see only the path hierarchy under ``cwd``. + """ + return {k: v for k, v in os.environ.items() if not k.startswith("GIT_")} + + +def _redact_arg(arg: str) -> str: + """Strip embedded userinfo from a URL-looking arg, leave others as-is. + + The external-remote backend invokes ``git clone https://x-access-token: + TOKEN@host/...`` style URLs. Without redaction the token would land in the + structured log when the spawn / timeout / cancellation handlers below + record the failing args. + """ + if "://" not in arg or "@" not in arg: + return arg + try: + parts = urlsplit(arg) + except ValueError: + return arg + if "@" not in parts.netloc: + return arg + host = parts.netloc.rsplit("@", 1)[1] + return urlunsplit((parts.scheme, host, parts.path, parts.query, parts.fragment)) + + +def _redact_args(args: tuple[str, ...]) -> tuple[str, ...]: + """Redact every URL-looking element of *args* (token-in-URL safe).""" + return tuple(_redact_arg(a) for a in args) + + async def run_git_subprocess( repo_root: Path, *args: str, @@ -52,6 +93,7 @@ async def run_git_subprocess( "git", *args, cwd=str(repo_root), + env=_sanitised_env(), stdout=asyncio.subprocess.PIPE, stderr=asyncio.subprocess.PIPE, ) @@ -61,7 +103,7 @@ async def run_git_subprocess( log_event, error_type=exc.__class__.__name__, error=msg, - args=args, + args=_redact_args(args), ) return (-1, "", msg) try: @@ -77,7 +119,7 @@ async def run_git_subprocess( log_event, error_type="TimeoutError", error=msg, - args=args, + args=_redact_args(args), ) return (-1, "", msg) except asyncio.CancelledError: @@ -87,7 +129,7 @@ async def run_git_subprocess( log_event, error_type="CancelledError", error="git subprocess cancelled by caller", - args=args, + args=_redact_args(args), ) raise diff --git a/src/synthorg/engine/workspace/git_backend/__init__.py b/src/synthorg/engine/workspace/git_backend/__init__.py new file mode 100644 index 0000000000..2f5f7ff954 --- /dev/null +++ b/src/synthorg/engine/workspace/git_backend/__init__.py @@ -0,0 +1,40 @@ +"""Pluggable git-backend subsystem. + +Protocol + three strategies (embedded / local-path / external-remote) ++ factory + ``GitBackendType`` discriminator. Switching the git +backend is a config change only; ``EMBEDDED`` is the safe default. +""" + +from synthorg.core.enums import GitBackendType +from synthorg.engine.workspace.git_backend.config import ( + GitBackendConfig, + GitBackendDeps, +) +from synthorg.engine.workspace.git_backend.embedded import EmbeddedGitBackend +from synthorg.engine.workspace.git_backend.external_remote import ( + ExternalRemoteGitBackend, +) +from synthorg.engine.workspace.git_backend.factory import build_git_backend +from synthorg.engine.workspace.git_backend.local_path import ( + LocalPathGitBackend, +) +from synthorg.engine.workspace.git_backend.protocol import ( + FetchResult, + GitBackend, + ProvisionResult, + PushResult, +) + +__all__ = [ + "EmbeddedGitBackend", + "ExternalRemoteGitBackend", + "FetchResult", + "GitBackend", + "GitBackendConfig", + "GitBackendDeps", + "GitBackendType", + "LocalPathGitBackend", + "ProvisionResult", + "PushResult", + "build_git_backend", +] diff --git a/src/synthorg/engine/workspace/git_backend/_git_ops.py b/src/synthorg/engine/workspace/git_backend/_git_ops.py new file mode 100644 index 0000000000..f5b3d97d68 --- /dev/null +++ b/src/synthorg/engine/workspace/git_backend/_git_ops.py @@ -0,0 +1,85 @@ +"""Shared git subprocess helpers for the git-backend strategies. + +System-internal (not agent-facing). Wraps +:func:`~synthorg.engine.workspace._git_subprocess.run_git_subprocess` +with typed-error raising so each strategy stays terse and every +failure surfaces as a :class:`~synthorg.engine.errors.GitBackendError`. +""" + +import asyncio +from pathlib import Path # noqa: TC003 -- runtime annotation (PEP 649 introspection) +from typing import TYPE_CHECKING + +from synthorg.engine.workspace._git_subprocess import ( + _redact_args, + run_git_subprocess, +) +from synthorg.observability import get_logger +from synthorg.observability.events.workspace import ( + GIT_BACKEND_PROVISION_FAILED, +) + +if TYPE_CHECKING: + from synthorg.engine.errors import GitBackendError + +logger = get_logger(__name__) + + +async def git( + repo_root: Path, + *args: str, + cmd_timeout: float, + fail_exc: type[GitBackendError], + project_id: str, + event: str = GIT_BACKEND_PROVISION_FAILED, +) -> str: + """Run ``git *args`` in *repo_root*; raise *fail_exc* on failure. + + Args: + repo_root: Working directory for the git command. + *args: Git command arguments. + cmd_timeout: Maximum seconds the subprocess may run. + fail_exc: Exception type raised on non-zero exit / spawn failure. + project_id: Project identifier for structured logging. + event: Structured-log event constant for failures; defaults to + ``GIT_BACKEND_PROVISION_FAILED`` so existing call sites stay + quiet, but push/fetch sites should pass their own event so + the log discriminates between operations. + + Returns: + Decoded stdout (stripped) on success. + + Raises: + GitBackendError: *fail_exc* instance when the command exits + non-zero or the process could not be spawned. + """ + rc, stdout, _stderr = await run_git_subprocess( + repo_root, + *args, + cmd_timeout=cmd_timeout, + log_event=event, + ) + if rc != 0: + logger.warning( + event, + project_id=project_id, + git_args=_redact_args(args[:2]), + return_code=rc, + ) + msg = f"git {args[0] if args else ''} failed (rc={rc})" + raise fail_exc(msg) + return stdout.strip() + + +async def is_git_repo(path: Path, *, cmd_timeout: float) -> bool: + """Return ``True`` if *path* is inside a git working tree.""" + if not await asyncio.to_thread(path.exists): + return False + rc, stdout, _stderr = await run_git_subprocess( + path, + "rev-parse", + "--is-inside-work-tree", + cmd_timeout=cmd_timeout, + log_event=GIT_BACKEND_PROVISION_FAILED, + ) + return rc == 0 and stdout.strip() == "true" diff --git a/src/synthorg/engine/workspace/git_backend/config.py b/src/synthorg/engine/workspace/git_backend/config.py new file mode 100644 index 0000000000..3323064134 --- /dev/null +++ b/src/synthorg/engine/workspace/git_backend/config.py @@ -0,0 +1,87 @@ +"""Pluggable git-backend plugin config + deps bundle. + +The :class:`~synthorg.core.enums.GitBackendType` discriminator selects +one of three strategies. The safe default is ``EMBEDDED`` (a bare repo +self-hosted on the persistent volume, no external dependency). +``LOCAL_PATH`` and ``EXTERNAL_REMOTE`` ship so that switching the git +backend is a config change only. Runtime collaborators that cannot +live in frozen config (the connection catalog, secret backend, clock) +travel in :class:`GitBackendDeps`, mirroring ``AutonomyStrategyDeps``. +""" + +from dataclasses import dataclass +from typing import TYPE_CHECKING, Final, Self + +from pydantic import BaseModel, ConfigDict, Field, model_validator + +from synthorg.core.enums import GitBackendType + +if TYPE_CHECKING: + from pathlib import Path + + from synthorg.core.clock import Clock + from synthorg.integrations.connections.catalog import ConnectionCatalog + from synthorg.persistence.secret_backends.protocol import SecretBackend + +_DEFAULT_EMBEDDED_SUBDIR: Final[str] = "git-repos" +_DEFAULT_GIT_CMD_TIMEOUT_SECONDS: Final[float] = 60.0 + + +class GitBackendConfig(BaseModel): + """Operator-tunable git-backend configuration. + + Default-constructed (``kind=EMBEDDED``) provisions a bare repo on + the persistent volume with no external dependency. + """ + + model_config = ConfigDict(frozen=True, allow_inf_nan=False, extra="forbid") + + kind: GitBackendType = GitBackendType.EMBEDDED + # EMBEDDED: subdirectory under the workspace base holding bare repos. + embedded_subdir: str = Field( + default=_DEFAULT_EMBEDDED_SUBDIR, + min_length=1, + ) + # LOCAL_PATH: caller-supplied existing git repository path. + local_repo_path: str | None = Field(default=None, min_length=1) + # EXTERNAL_REMOTE: connection-catalog name for the forge connection. + remote_connection_name: str | None = Field(default=None, min_length=1) + # Maximum seconds any single git subprocess may run. + git_cmd_timeout_seconds: float = Field( + default=_DEFAULT_GIT_CMD_TIMEOUT_SECONDS, + gt=0.0, + ) + + @model_validator(mode="after") + def _check_kind_requirements(self) -> Self: + """Each non-default kind needs its addressing field set.""" + if self.kind is GitBackendType.LOCAL_PATH and not self.local_repo_path: + msg = "LOCAL_PATH git backend requires 'local_repo_path'" + raise ValueError(msg) + if ( + self.kind is GitBackendType.EXTERNAL_REMOTE + and not self.remote_connection_name + ): + msg = "EXTERNAL_REMOTE git backend requires 'remote_connection_name'" + raise ValueError(msg) + return self + + +@dataclass(frozen=True, slots=True) +class GitBackendDeps: + """Runtime collaborators the frozen config cannot carry. + + Attributes: + workspace_base_root: REQUIRED for ``EMBEDDED`` (the persistent + volume base under which bare repos are self-hosted). + connection_catalog: REQUIRED for ``EXTERNAL_REMOTE`` (resolves + the forge connection by name). + secret_backend: REQUIRED for ``EXTERNAL_REMOTE`` (resolves the + connection's access token). + clock: Clock seam for provisioning timestamps. + """ + + workspace_base_root: Path | None = None + connection_catalog: ConnectionCatalog | None = None + secret_backend: SecretBackend | None = None + clock: Clock | None = None diff --git a/src/synthorg/engine/workspace/git_backend/embedded.py b/src/synthorg/engine/workspace/git_backend/embedded.py new file mode 100644 index 0000000000..a77ccf94c1 --- /dev/null +++ b/src/synthorg/engine/workspace/git_backend/embedded.py @@ -0,0 +1,282 @@ +"""Embedded git backend: self-hosted bare repo on the persistent volume. + +The safe default. Each project gets a bare repository at +``//.git`` and a working tree +at the project workspace path with ``origin`` pointing at that bare +repo. No external dependency; pushes/fetches are pure-local. +""" + +import asyncio +from pathlib import Path +from typing import Final + +from synthorg.core.clock import Clock, SystemClock +from synthorg.core.enums import GitBackendType +from synthorg.core.types import NotBlankStr +from synthorg.engine.errors import ( + GitBackendFetchError, + GitBackendProvisionError, + GitBackendPushError, +) +from synthorg.engine.workspace.git_backend._git_ops import git, is_git_repo +from synthorg.engine.workspace.git_backend.protocol import ( + FetchResult, + ProvisionResult, + PushResult, +) +from synthorg.observability import get_logger +from synthorg.observability.events.workspace import ( + GIT_BACKEND_FETCH_COMPLETE, + GIT_BACKEND_FETCH_FAILED, + GIT_BACKEND_PROVISION_COMPLETE, + GIT_BACKEND_PROVISION_START, + GIT_BACKEND_PUSH_COMPLETE, + GIT_BACKEND_PUSH_FAILED, +) + +logger = get_logger(__name__) + +_BOT_NAME: Final[str] = "SynthOrg" +_BOT_EMAIL: Final[str] = "synthorg-bot@synthorg.local" +_REMOTE_NAME: Final[str] = "origin" + + +class EmbeddedGitBackend: + """Bare-repo-on-volume git backend (default strategy).""" + + def __init__( + self, + *, + base_root: Path, + embedded_subdir: str, + cmd_timeout: float, + clock: Clock | None = None, + ) -> None: + self._base_root = base_root + self._embedded_subdir = embedded_subdir + self._cmd_timeout = cmd_timeout + self._clock: Clock = clock if clock is not None else SystemClock() + + def get_backend_type(self) -> GitBackendType: + """Return the ``EMBEDDED`` discriminator.""" + return GitBackendType.EMBEDDED + + def _bare_repo_path(self, project_id: str) -> Path: + return self._base_root / self._embedded_subdir / f"{project_id}.git" + + async def _reject_if_nested_in_parent_worktree(self, path: Path, pid: str) -> None: + """Refuse if *path* sits inside an EXISTING parent working tree. + + ``rev-parse --show-toplevel`` succeeds (exit 0) and reports a + toplevel DIFFERENT from *path* iff some parent dir of *path* is a + working tree; in that case ``git init`` would silently no-op and + the subsequent ``git config`` / ``git commit`` would mutate the + outer repo. Raise instead of silently corrupting it. If git + already considers *path* itself the toplevel (idempotent provision + on a previously-initialised dir), we let the caller proceed and + rely on ``is_git_repo`` upstream for short-circuit semantics. + """ + from synthorg.engine.workspace._git_subprocess import ( # noqa: PLC0415 + run_git_subprocess, + ) + from synthorg.observability.events.workspace import ( # noqa: PLC0415 + GIT_BACKEND_PROVISION_FAILED, + ) + + rc, stdout, _stderr = await run_git_subprocess( + path, + "rev-parse", + "--show-toplevel", + cmd_timeout=self._cmd_timeout, + log_event=GIT_BACKEND_PROVISION_FAILED, + ) + if rc != 0: + return + toplevel = await asyncio.to_thread(Path(stdout).resolve) + if toplevel == await asyncio.to_thread(path.resolve): + return + msg = ( + f"refusing to provision project {pid!r} inside an existing " + f"parent git working tree at {toplevel!s}" + ) + raise GitBackendProvisionError(msg) + + async def _configure_identity(self, workspace_path: Path, pid: str) -> None: + await git( + workspace_path, + "config", + "user.email", + _BOT_EMAIL, + cmd_timeout=self._cmd_timeout, + fail_exc=GitBackendProvisionError, + project_id=pid, + ) + await git( + workspace_path, + "config", + "user.name", + _BOT_NAME, + cmd_timeout=self._cmd_timeout, + fail_exc=GitBackendProvisionError, + project_id=pid, + ) + + async def provision( + self, + *, + project_id: NotBlankStr, + workspace_path: Path, + default_branch: NotBlankStr, + ) -> ProvisionResult: + """Create the bare repo + working tree (idempotent).""" + pid = str(project_id) + logger.info( + GIT_BACKEND_PROVISION_START, + project_id=pid, + backend=GitBackendType.EMBEDDED.value, + ) + if await is_git_repo(workspace_path, cmd_timeout=self._cmd_timeout): + return ProvisionResult( + repo_root=NotBlankStr(str(workspace_path)), + default_branch=default_branch, + newly_created=False, + ) + + bare = self._bare_repo_path(pid) + try: + await asyncio.to_thread(bare.mkdir, parents=True, exist_ok=True) + await asyncio.to_thread(workspace_path.mkdir, parents=True, exist_ok=True) + except OSError as exc: + msg = f"failed to create workspace dirs for {pid!r}" + raise GitBackendProvisionError(msg) from exc + # Safety: never run ``git init`` / ``git config`` / ``git commit`` + # from inside an existing parent working tree. ``is_git_repo`` + # already short-circuits when *workspace_path* itself is a working + # tree, but if a parent dir is a working tree (e.g. the synthorg + # repo when base_root was misconfigured) the subsequent commands + # would mutate that outer repo's config and create stray empty + # commits on it. The bare path is created under our own + # config-controlled ``base_root`` so we only guard the working + # tree, which is where ``git config`` / ``git commit`` execute. + await self._reject_if_nested_in_parent_worktree(workspace_path, pid) + + await git( + bare, + "init", + "--bare", + "--initial-branch", + str(default_branch), + cmd_timeout=self._cmd_timeout, + fail_exc=GitBackendProvisionError, + project_id=pid, + ) + await git( + workspace_path, + "init", + "--initial-branch", + str(default_branch), + cmd_timeout=self._cmd_timeout, + fail_exc=GitBackendProvisionError, + project_id=pid, + ) + await self._configure_identity(workspace_path, pid) + await git( + workspace_path, + "commit", + "--allow-empty", + "-m", + "Initialise project workspace", + cmd_timeout=self._cmd_timeout, + fail_exc=GitBackendProvisionError, + project_id=pid, + ) + await git( + workspace_path, + "remote", + "add", + _REMOTE_NAME, + str(bare), + cmd_timeout=self._cmd_timeout, + fail_exc=GitBackendProvisionError, + project_id=pid, + ) + await git( + workspace_path, + "push", + _REMOTE_NAME, + str(default_branch), + cmd_timeout=self._cmd_timeout, + fail_exc=GitBackendProvisionError, + project_id=pid, + ) + logger.info( + GIT_BACKEND_PROVISION_COMPLETE, + project_id=pid, + backend=GitBackendType.EMBEDDED.value, + ) + return ProvisionResult( + repo_root=NotBlankStr(str(workspace_path)), + default_branch=default_branch, + newly_created=True, + ) + + async def push( + self, + *, + project_id: NotBlankStr, + repo_root: Path, + branch: NotBlankStr, + base_branch: NotBlankStr, # noqa: ARG002 -- local origin tracks base + ) -> PushResult: + """Push *branch* to the project's bare repo; return its head SHA.""" + pid = str(project_id) + await git( + repo_root, + "push", + _REMOTE_NAME, + str(branch), + cmd_timeout=self._cmd_timeout, + fail_exc=GitBackendPushError, + project_id=pid, + event=GIT_BACKEND_PUSH_FAILED, + ) + head = await git( + repo_root, + "rev-parse", + str(branch), + cmd_timeout=self._cmd_timeout, + fail_exc=GitBackendPushError, + project_id=pid, + event=GIT_BACKEND_PUSH_FAILED, + ) + logger.info(GIT_BACKEND_PUSH_COMPLETE, project_id=pid, branch=str(branch)) + return PushResult(branch=branch, head_sha=NotBlankStr(head)) + + async def fetch( + self, + *, + project_id: NotBlankStr, + repo_root: Path, + branch: NotBlankStr | None = None, + ) -> FetchResult: + """Fetch from the project's bare repo into *repo_root*.""" + pid = str(project_id) + args = ["fetch", _REMOTE_NAME] + if branch is not None: + args.append(str(branch)) + await git( + repo_root, + *args, + cmd_timeout=self._cmd_timeout, + fail_exc=GitBackendFetchError, + project_id=pid, + event=GIT_BACKEND_FETCH_FAILED, + ) + logger.info(GIT_BACKEND_FETCH_COMPLETE, project_id=pid) + refs: tuple[NotBlankStr, ...] = ( + (NotBlankStr(str(branch)),) if branch is not None else () + ) + return FetchResult(updated_refs=refs) + + +__all__ = ["EmbeddedGitBackend"] diff --git a/src/synthorg/engine/workspace/git_backend/external_remote.py b/src/synthorg/engine/workspace/git_backend/external_remote.py new file mode 100644 index 0000000000..ef14d16372 --- /dev/null +++ b/src/synthorg/engine/workspace/git_backend/external_remote.py @@ -0,0 +1,217 @@ +"""External-remote git backend: GitHub/GitLab/Gitea/Forgejo via catalog. + +Thin glue over the existing connection catalog + secret resolution. +The remote is addressed as ``/.git`` +with a token injected for HTTPS auth. Deep hardening (OAuth token +refresh, expiry, rate-limit + retry, forge-API repo creation) is the +tracked follow-up issue; this ships the protocol + config surface so +switching the git backend is a config change only. +""" + +import asyncio +from pathlib import Path # noqa: TC003 -- runtime annotation (PEP 649) +from typing import TYPE_CHECKING, Final +from urllib.parse import quote, urlsplit, urlunsplit + +from synthorg.core.clock import Clock, SystemClock +from synthorg.core.enums import GitBackendType +from synthorg.core.types import NotBlankStr +from synthorg.engine.errors import ( + GitBackendConfigError, + GitBackendFetchError, + GitBackendProvisionError, + GitBackendPushError, +) +from synthorg.engine.workspace.git_backend._git_ops import git, is_git_repo +from synthorg.engine.workspace.git_backend.protocol import ( + FetchResult, + ProvisionResult, + PushResult, +) +from synthorg.observability import get_logger +from synthorg.observability.events.workspace import ( + GIT_BACKEND_FETCH_COMPLETE, + GIT_BACKEND_FETCH_FAILED, + GIT_BACKEND_PROVISION_COMPLETE, + GIT_BACKEND_PROVISION_START, + GIT_BACKEND_PUSH_COMPLETE, + GIT_BACKEND_PUSH_FAILED, +) + +if TYPE_CHECKING: + from synthorg.integrations.connections.catalog import ConnectionCatalog + +logger = get_logger(__name__) + +_REMOTE_NAME: Final[str] = "origin" +_TOKEN_USER: Final[str] = "x-access-token" # noqa: S105 -- username, not a secret + + +class ExternalRemoteGitBackend: + """Forge-remote git backend resolved via the connection catalog.""" + + def __init__( + self, + *, + connection_name: str, + connection_catalog: ConnectionCatalog, + cmd_timeout: float, + clock: Clock | None = None, + ) -> None: + self._connection_name = connection_name + self._catalog = connection_catalog + self._cmd_timeout = cmd_timeout + self._clock: Clock = clock if clock is not None else SystemClock() + + def get_backend_type(self) -> GitBackendType: + """Return the ``EXTERNAL_REMOTE`` discriminator.""" + return GitBackendType.EXTERNAL_REMOTE + + async def _authenticated_remote_url(self, project_id: str) -> str: + """Resolve ``/.git`` with a token injected.""" + connection = await self._catalog.get(self._connection_name) + if connection is None or not connection.base_url: + msg = ( + f"external git connection {self._connection_name!r} is not " + "registered or has no base_url" + ) + raise GitBackendConfigError(msg) + # ``get_credentials`` returns ``dict[str, str]`` per its protocol + # (never None); an empty dict / missing ``token`` key surfaces as + # the same typed ``GitBackendConfigError`` below, which is the + # path a None return would have taken. + credentials = await self._catalog.get_credentials(self._connection_name) + token = credentials.get("token") + if not token: + msg = ( + f"external git connection {self._connection_name!r} has no " + "'token' credential" + ) + raise GitBackendConfigError(msg) + split = urlsplit(str(connection.base_url).rstrip("/")) + if split.scheme != "https": + msg = "external git remote must be an https URL" + raise GitBackendConfigError(msg) + # Use hostname/port instead of raw netloc so any pre-existing + # userinfo on the configured base_url cannot collide with the + # token we inject; percent-encode the token so reserved + # characters (``@``, ``:``, ``/`` ...) survive URL parsing + # rather than breaking the netloc. + host = split.hostname + if not host: + msg = ( + f"external git connection {self._connection_name!r} has " + "invalid base_url (no host component)" + ) + raise GitBackendConfigError(msg) + host_with_port = f"{host}:{split.port}" if split.port is not None else host + netloc = f"{_TOKEN_USER}:{quote(token, safe='')}@{host_with_port}" + path = f"{split.path}/{project_id}.git" + return urlunsplit((split.scheme, netloc, path, "", "")) + + async def provision( + self, + *, + project_id: NotBlankStr, + workspace_path: Path, + default_branch: NotBlankStr, + ) -> ProvisionResult: + """Clone the resolved forge remote into *workspace_path*.""" + pid = str(project_id) + logger.info( + GIT_BACKEND_PROVISION_START, + project_id=pid, + backend=GitBackendType.EXTERNAL_REMOTE.value, + ) + if await is_git_repo(workspace_path, cmd_timeout=self._cmd_timeout): + return ProvisionResult( + repo_root=NotBlankStr(str(workspace_path)), + default_branch=default_branch, + newly_created=False, + ) + url = await self._authenticated_remote_url(pid) + try: + await asyncio.to_thread(workspace_path.mkdir, parents=True, exist_ok=True) + except OSError as exc: + msg = f"failed to create workspace dir for {pid!r}" + raise GitBackendProvisionError(msg) from exc + await git( + workspace_path, + "clone", + url, + ".", + cmd_timeout=self._cmd_timeout, + fail_exc=GitBackendProvisionError, + project_id=pid, + ) + logger.info( + GIT_BACKEND_PROVISION_COMPLETE, + project_id=pid, + backend=GitBackendType.EXTERNAL_REMOTE.value, + ) + return ProvisionResult( + repo_root=NotBlankStr(str(workspace_path)), + default_branch=default_branch, + newly_created=True, + ) + + async def push( + self, + *, + project_id: NotBlankStr, + repo_root: Path, + branch: NotBlankStr, + base_branch: NotBlankStr, # noqa: ARG002 -- remote tracks its own base + ) -> PushResult: + """Push *branch* to the forge remote; return its head SHA.""" + pid = str(project_id) + await git( + repo_root, + "push", + _REMOTE_NAME, + str(branch), + cmd_timeout=self._cmd_timeout, + fail_exc=GitBackendPushError, + project_id=pid, + event=GIT_BACKEND_PUSH_FAILED, + ) + head = await git( + repo_root, + "rev-parse", + str(branch), + cmd_timeout=self._cmd_timeout, + fail_exc=GitBackendPushError, + project_id=pid, + event=GIT_BACKEND_PUSH_FAILED, + ) + logger.info(GIT_BACKEND_PUSH_COMPLETE, project_id=pid, branch=str(branch)) + return PushResult(branch=branch, head_sha=NotBlankStr(head)) + + async def fetch( + self, + *, + project_id: NotBlankStr, + repo_root: Path, + branch: NotBlankStr | None = None, + ) -> FetchResult: + """Fetch from the forge remote into *repo_root*.""" + pid = str(project_id) + args = ["fetch", _REMOTE_NAME] + if branch is not None: + args.append(str(branch)) + await git( + repo_root, + *args, + cmd_timeout=self._cmd_timeout, + fail_exc=GitBackendFetchError, + project_id=pid, + event=GIT_BACKEND_FETCH_FAILED, + ) + logger.info(GIT_BACKEND_FETCH_COMPLETE, project_id=pid) + refs: tuple[NotBlankStr, ...] = ( + (NotBlankStr(str(branch)),) if branch is not None else () + ) + return FetchResult(updated_refs=refs) + + +__all__ = ["ExternalRemoteGitBackend"] diff --git a/src/synthorg/engine/workspace/git_backend/factory.py b/src/synthorg/engine/workspace/git_backend/factory.py new file mode 100644 index 0000000000..e4041f97ae --- /dev/null +++ b/src/synthorg/engine/workspace/git_backend/factory.py @@ -0,0 +1,112 @@ +"""Git-backend factory. + +Maps :class:`~synthorg.core.enums.GitBackendType` to a concrete +:class:`GitBackend` via the ``StrEnum``-keyed +:class:`~synthorg.core.registry.StrategyRegistry`. A kind whose +required dependency is absent raises :class:`GitBackendConfigError` at +construction (fail fast), exactly like the autonomy strategy factory. +""" + +from typing import TYPE_CHECKING + +from synthorg.core.enums import GitBackendType +from synthorg.core.registry import StrategyRegistry +from synthorg.engine.errors import GitBackendConfigError +from synthorg.engine.workspace.git_backend.embedded import EmbeddedGitBackend +from synthorg.engine.workspace.git_backend.external_remote import ( + ExternalRemoteGitBackend, +) +from synthorg.engine.workspace.git_backend.local_path import ( + LocalPathGitBackend, +) + +if TYPE_CHECKING: + from synthorg.engine.workspace.git_backend.config import ( + GitBackendConfig, + GitBackendDeps, + ) + from synthorg.engine.workspace.git_backend.protocol import GitBackend + + +def _build_embedded( + config: GitBackendConfig, + deps: GitBackendDeps, +) -> GitBackend: + if deps.workspace_base_root is None: + msg = ( + "EMBEDDED git backend requires a 'workspace_base_root' " + "dependency but none was provided" + ) + raise GitBackendConfigError(msg) + return EmbeddedGitBackend( + base_root=deps.workspace_base_root, + embedded_subdir=config.embedded_subdir, + cmd_timeout=config.git_cmd_timeout_seconds, + clock=deps.clock, + ) + + +def _build_local_path( + config: GitBackendConfig, + deps: GitBackendDeps, +) -> GitBackend: + if not config.local_repo_path: + msg = "LOCAL_PATH git backend requires 'local_repo_path' in config" + raise GitBackendConfigError(msg) + return LocalPathGitBackend( + local_repo_path=config.local_repo_path, + cmd_timeout=config.git_cmd_timeout_seconds, + clock=deps.clock, + ) + + +def _build_external_remote( + config: GitBackendConfig, + deps: GitBackendDeps, +) -> GitBackend: + if not config.remote_connection_name: + msg = "EXTERNAL_REMOTE git backend requires 'remote_connection_name' in config" + raise GitBackendConfigError(msg) + if deps.connection_catalog is None: + msg = ( + "EXTERNAL_REMOTE git backend requires a 'connection_catalog' " + "dependency but none was provided" + ) + raise GitBackendConfigError(msg) + return ExternalRemoteGitBackend( + connection_name=config.remote_connection_name, + connection_catalog=deps.connection_catalog, + cmd_timeout=config.git_cmd_timeout_seconds, + clock=deps.clock, + ) + + +_REGISTRY: StrategyRegistry[GitBackend] = StrategyRegistry( + { + GitBackendType.EMBEDDED: _build_embedded, + GitBackendType.LOCAL_PATH: _build_local_path, + GitBackendType.EXTERNAL_REMOTE: _build_external_remote, + }, + kind="git_backend", +) + + +def build_git_backend( + config: GitBackendConfig, + deps: GitBackendDeps, +) -> GitBackend: + """Build the configured :class:`GitBackend`. + + Args: + config: The strategy discriminator + per-impl tuning. + deps: Runtime collaborators (base root, catalog, secret, clock). + + Returns: + A strategy satisfying the ``GitBackend`` protocol. + + Raises: + StrategyFactoryNotFoundError: Unknown ``config.kind``. + GitBackendConfigError: A strategy is missing a required + dependency or addressing field. + """ + return _REGISTRY.build(config.kind, config, deps) diff --git a/src/synthorg/engine/workspace/git_backend/local_path.py b/src/synthorg/engine/workspace/git_backend/local_path.py new file mode 100644 index 0000000000..acb33df327 --- /dev/null +++ b/src/synthorg/engine/workspace/git_backend/local_path.py @@ -0,0 +1,244 @@ +"""Local-path git backend: bring-your-own repository base on disk. + +The configured ``local_repo_path`` is treated as a BASE directory under +which one repository per project is provisioned at +``/``; this is what guarantees per-project +isolation when the local-path backend is paired with the multi-project +:class:`~synthorg.engine.workspace.project_workspace_service.ProjectWorkspaceService`. +There is no separate remote: the on-disk repo is the durable store, so +``push``/``fetch`` resolve against the repo itself (the coordinator +merge queue still serialises merges upstream). +""" + +import asyncio +from pathlib import Path + +from synthorg.core.clock import Clock, SystemClock +from synthorg.core.enums import GitBackendType +from synthorg.core.types import NotBlankStr +from synthorg.engine.errors import ( + GitBackendConfigError, + GitBackendProvisionError, + GitBackendPushError, +) +from synthorg.engine.workspace.git_backend._git_ops import git, is_git_repo +from synthorg.engine.workspace.git_backend.protocol import ( + FetchResult, + ProvisionResult, + PushResult, +) +from synthorg.observability import get_logger +from synthorg.observability.events.workspace import ( + GIT_BACKEND_PROVISION_COMPLETE, + GIT_BACKEND_PROVISION_START, + GIT_BACKEND_PUSH_COMPLETE, + GIT_BACKEND_PUSH_FAILED, +) + +logger = get_logger(__name__) + +_BOT_NAME = "SynthOrg" +_BOT_EMAIL = "synthorg-bot@synthorg.local" + + +class LocalPathGitBackend: + """Caller-supplied local git repository backend. + + Each project gets its own repository at ``/``; + the configured ``local_repo_path`` is the BASE under which those + per-project repos live, NOT a single shared working tree. + """ + + def __init__( + self, + *, + local_repo_path: str, + cmd_timeout: float, + clock: Clock | None = None, + ) -> None: + self._repo_base = Path(local_repo_path) + self._cmd_timeout = cmd_timeout + self._clock: Clock = clock if clock is not None else SystemClock() + + def get_backend_type(self) -> GitBackendType: + """Return the ``LOCAL_PATH`` discriminator.""" + return GitBackendType.LOCAL_PATH + + def _repo_path_for_project(self, project_id: str) -> Path: + """Derive the per-project repository path under the base.""" + return self._repo_base / project_id + + def _non_empty_non_repo_dir(self, repo_path: Path) -> bool: + """True if *repo_path* exists with content (sync; run off-loop). + + A path that exists as a file (not a directory) also counts as + "non-empty" -- ``iterdir()`` on a file raises ``NotADirectoryError``, + and the caller's intent is to refuse provisioning over a + non-empty location regardless of whether it is a file or a dir. + """ + if not repo_path.exists(): + return False + if not repo_path.is_dir(): + return True + return any(repo_path.iterdir()) + + async def _reject_if_nested_in_parent_worktree(self, path: Path, pid: str) -> None: + """Refuse if *path* is inside an existing parent working tree. + + ``git rev-parse --show-toplevel`` succeeds (exit 0) and reports a + toplevel DIFFERENT from *path* iff some parent dir of *path* is a + working tree; in that case ``git init`` would silently no-op and + the subsequent ``git config`` / ``git commit`` would mutate the + outer repo. Raise instead of silently corrupting it. + """ + from synthorg.engine.workspace._git_subprocess import ( # noqa: PLC0415 + run_git_subprocess, + ) + from synthorg.observability.events.workspace import ( # noqa: PLC0415 + GIT_BACKEND_PROVISION_FAILED, + ) + + rc, stdout, _stderr = await run_git_subprocess( + path, + "rev-parse", + "--show-toplevel", + cmd_timeout=self._cmd_timeout, + log_event=GIT_BACKEND_PROVISION_FAILED, + ) + if rc != 0: + return + toplevel = await asyncio.to_thread(Path(stdout).resolve) + if toplevel == await asyncio.to_thread(path.resolve): + return + msg = ( + f"refusing to provision project {pid!r}: local repo path " + f"{path!s} is nested inside an existing parent working tree " + f"at {toplevel!s}" + ) + raise GitBackendProvisionError(msg) + + async def provision( + self, + *, + project_id: NotBlankStr, + workspace_path: Path, # noqa: ARG002 -- per-project base path derived from config + default_branch: NotBlankStr, + ) -> ProvisionResult: + """Validate / initialise the per-project local repository.""" + pid = str(project_id) + repo_path = self._repo_path_for_project(pid) + logger.info( + GIT_BACKEND_PROVISION_START, + project_id=pid, + backend=GitBackendType.LOCAL_PATH.value, + ) + if await is_git_repo(repo_path, cmd_timeout=self._cmd_timeout): + return ProvisionResult( + repo_root=NotBlankStr(str(repo_path)), + default_branch=default_branch, + newly_created=False, + ) + if await asyncio.to_thread(self._non_empty_non_repo_dir, repo_path): + msg = ( + f"local repo path {repo_path!s} exists but is not a " + "git repository and is not empty" + ) + raise GitBackendConfigError(msg) + try: + await asyncio.to_thread(repo_path.mkdir, parents=True, exist_ok=True) + except OSError as exc: + msg = f"failed to create local repo dir for {pid!r}" + raise GitBackendProvisionError(msg) from exc + # Refuse to run ``git init`` / ``git config`` / ``git commit`` if the + # caller-supplied path is nested inside a parent working tree (e.g. + # accidentally pointing at a subdir of the synthorg repo). Without + # this guard the subsequent commands would mutate that outer repo's + # config + add stray empty commits to it. + await self._reject_if_nested_in_parent_worktree(repo_path, pid) + await git( + repo_path, + "init", + "--initial-branch", + str(default_branch), + cmd_timeout=self._cmd_timeout, + fail_exc=GitBackendProvisionError, + project_id=pid, + ) + await git( + repo_path, + "config", + "user.email", + _BOT_EMAIL, + cmd_timeout=self._cmd_timeout, + fail_exc=GitBackendProvisionError, + project_id=pid, + ) + await git( + repo_path, + "config", + "user.name", + _BOT_NAME, + cmd_timeout=self._cmd_timeout, + fail_exc=GitBackendProvisionError, + project_id=pid, + ) + await git( + repo_path, + "commit", + "--allow-empty", + "-m", + "Initialise project workspace", + cmd_timeout=self._cmd_timeout, + fail_exc=GitBackendProvisionError, + project_id=pid, + ) + logger.info( + GIT_BACKEND_PROVISION_COMPLETE, + project_id=pid, + backend=GitBackendType.LOCAL_PATH.value, + ) + return ProvisionResult( + repo_root=NotBlankStr(str(repo_path)), + default_branch=default_branch, + newly_created=True, + ) + + async def push( + self, + *, + project_id: NotBlankStr, + repo_root: Path, + branch: NotBlankStr, + base_branch: NotBlankStr, # noqa: ARG002 -- no remote; on-disk is durable + ) -> PushResult: + """Resolve the branch head (on-disk repo is the durable store).""" + # Local-path: the on-disk repo IS the durable store; "push" is + # the no-op durability point. Resolve the branch head so callers + # still get a verifiable commit SHA. + pid = str(project_id) + head = await git( + repo_root, + "rev-parse", + str(branch), + cmd_timeout=self._cmd_timeout, + fail_exc=GitBackendPushError, + project_id=pid, + event=GIT_BACKEND_PUSH_FAILED, + ) + logger.info(GIT_BACKEND_PUSH_COMPLETE, project_id=pid, branch=str(branch)) + return PushResult(branch=branch, head_sha=NotBlankStr(head)) + + async def fetch( + self, + *, + project_id: NotBlankStr, # noqa: ARG002 -- no remote to fetch from + repo_root: Path, # noqa: ARG002 + branch: NotBlankStr | None = None, # noqa: ARG002 + ) -> FetchResult: + """No remote to fetch from; returns empty refs for protocol parity.""" + # No remote: nothing to fetch. Returning empty refs keeps the + # protocol contract uniform across backends. + return FetchResult(updated_refs=()) + + +__all__ = ["LocalPathGitBackend"] diff --git a/src/synthorg/engine/workspace/git_backend/protocol.py b/src/synthorg/engine/workspace/git_backend/protocol.py new file mode 100644 index 0000000000..94a653e339 --- /dev/null +++ b/src/synthorg/engine/workspace/git_backend/protocol.py @@ -0,0 +1,104 @@ +"""Pluggable git-backend protocol and result models. + +A :class:`GitBackend` owns "where git lives" for a project: it +provisions the repository, and serialised pushes/fetches flow through +it. Implementations are interchangeable behind the +:class:`~synthorg.core.enums.GitBackendType` discriminator so switching +storage is a config change only. +""" + +from pathlib import Path # noqa: TC003 -- runtime annotation (PEP 649 introspection) +from typing import Protocol, runtime_checkable + +from pydantic import BaseModel, ConfigDict + +from synthorg.core.enums import GitBackendType # noqa: TC001 +from synthorg.core.types import NotBlankStr # noqa: TC001 + + +class ProvisionResult(BaseModel): + """Outcome of provisioning a project's git repository.""" + + model_config = ConfigDict(frozen=True, allow_inf_nan=False, extra="forbid") + + repo_root: NotBlankStr + default_branch: NotBlankStr + newly_created: bool + + +class PushResult(BaseModel): + """Outcome of a push to the backend.""" + + model_config = ConfigDict(frozen=True, allow_inf_nan=False, extra="forbid") + + branch: NotBlankStr + head_sha: NotBlankStr + + +class FetchResult(BaseModel): + """Outcome of a fetch from the backend.""" + + model_config = ConfigDict(frozen=True, allow_inf_nan=False, extra="forbid") + + updated_refs: tuple[NotBlankStr, ...] = () + + +@runtime_checkable +class GitBackend(Protocol): + """Pluggable git storage strategy for a project workspace. + + Failures raise a typed + :class:`~synthorg.engine.errors.GitBackendError` subclass rather + than returning an error string, so secret-shaped exception text + can never leak into result fields. + """ + + async def provision( + self, + *, + project_id: NotBlankStr, + workspace_path: Path, + default_branch: NotBlankStr, + ) -> ProvisionResult: + """Ensure the project's git repository exists and is initialised. + + Idempotent: a second call on an already-provisioned workspace + returns ``newly_created=False`` without mutating history. + + Raises: + GitBackendProvisionError: Repository creation failed. + """ + ... + + async def push( + self, + *, + project_id: NotBlankStr, + repo_root: Path, + branch: NotBlankStr, + base_branch: NotBlankStr, + ) -> PushResult: + """Push *branch* to the backend. + + Raises: + GitBackendPushError: The push was rejected or failed. + """ + ... + + async def fetch( + self, + *, + project_id: NotBlankStr, + repo_root: Path, + branch: NotBlankStr | None = None, + ) -> FetchResult: + """Fetch updates from the backend into *repo_root*. + + Raises: + GitBackendFetchError: The fetch failed. + """ + ... + + def get_backend_type(self) -> GitBackendType: + """Return the discriminator identifying this backend.""" + ... diff --git a/src/synthorg/engine/workspace/project_workspace_service.py b/src/synthorg/engine/workspace/project_workspace_service.py new file mode 100644 index 0000000000..881275c8f8 --- /dev/null +++ b/src/synthorg/engine/workspace/project_workspace_service.py @@ -0,0 +1,257 @@ +"""Persistent per-project workspace provisioning service. + +Resolves (and lazily provisions, once) the 1:1 persistent git-backed +workspace for a project. ``GitBackendConfig.kind`` is authoritative: +if a persisted row was provisioned under a different backend than the +live config, the workspace is re-provisioned under the new backend. +On a kind switch the prior backend's ``.git`` directory at the prior +on-disk path is removed before the new backend provisions, so each +backend's ``is_git_repo`` short-circuit cannot keep the old layout +alive after the row claims the new kind; the new backend then decides +its own on-disk location (the persisted row reflects whatever path +the new backend reports). +""" + +import asyncio +import shutil +from pathlib import Path +from typing import TYPE_CHECKING, Final + +from synthorg.core.clock import Clock, SystemClock +from synthorg.core.enums import GitBackendType +from synthorg.core.project_workspace import ProjectWorkspace +from synthorg.core.types import NotBlankStr +from synthorg.engine.errors import GitBackendConfigError +from synthorg.observability import get_logger, safe_error_description +from synthorg.observability.events.workspace import ( + PROJECT_WORKSPACE_PROVISIONED, + PROJECT_WORKSPACE_REUSED, + WORKSPACE_BACKEND_KIND_CHANGED, + WORKSPACE_GIT_DIR_CLEARED, + WORKSPACE_PATH_TRAVERSAL_REJECTED, +) + +if TYPE_CHECKING: + from synthorg.engine.workspace.git_backend import ( + GitBackend, + GitBackendConfig, + ) + from synthorg.persistence.project_workspace_protocol import ( + ProjectWorkspaceRepository, + ) + +logger = get_logger(__name__) + +_PROJECTS_SUBDIR: Final[str] = "projects" +_DEFAULT_BRANCH: NotBlankStr = NotBlankStr("main") + + +class ProjectWorkspaceService: + """Provisions and resolves the persistent workspace for a project. + + Args: + base_root: Persistent volume base; project trees live under + ``/projects/``. + repo: Persistence for the :class:`ProjectWorkspace` row. + git_backend: The configured backend that provisions the repo. + config: Git-backend config (its ``kind`` is authoritative). + clock: Clock seam for row timestamps. + """ + + __slots__ = ( + "_base_root", + "_clock", + "_config", + "_git_backend", + "_locks", + "_locks_guard", + "_repo", + ) + + def __init__( + self, + *, + base_root: Path, + repo: ProjectWorkspaceRepository, + git_backend: GitBackend, + config: GitBackendConfig, + clock: Clock | None = None, + ) -> None: + self._base_root = base_root + self._repo = repo + self._git_backend = git_backend + self._config = config + self._clock: Clock = clock if clock is not None else SystemClock() + self._locks: dict[str, asyncio.Lock] = {} + self._locks_guard = asyncio.Lock() + + @property + def git_backend(self) -> GitBackend: + """The wired git backend; consumed by downstream wiring.""" + return self._git_backend + + def _workspace_path(self, project_id: str) -> Path: + # Defense-in-depth: ``project_id`` is system-generated and reaches + # this seam only via persisted Project rows, but rejecting path + # separators here closes the door if a future caller ever passes + # an attacker-controlled value (no traversal out of the projects + # subdir, no absolute-path takeover of the base root). + if "/" in project_id or "\\" in project_id or ".." in project_id: + # Log before raising so blocked traversal attempts surface + # in production audit logs, not just on the caller's stack. + logger.warning( + WORKSPACE_PATH_TRAVERSAL_REJECTED, + project_id=project_id, + base_root=str(self._base_root), + projects_subdir=_PROJECTS_SUBDIR, + ) + msg = ( + f"refusing path-separator-bearing project_id " + f"{project_id!r}: workspace path traversal blocked" + ) + raise GitBackendConfigError(msg) + return self._base_root / _PROJECTS_SUBDIR / project_id + + async def _clear_prior_git_dir( + self, + *, + project_id: str, + prior_path: Path, + ) -> None: + """Remove the prior backend's ``.git`` directory before a kind switch. + + Every backend short-circuits ``provision()`` when + ``is_git_repo(path)`` is true; without clearing the prior + ``.git`` metadata on a kind change, the new backend (EMBEDDED -> + EXTERNAL_REMOTE, etc.) would never re-initialise the on-disk + layout despite the row claiming the new kind. Only the ``.git`` + subdirectory is removed; any user-owned files at ``prior_path`` + (e.g. a BYO LOCAL_PATH tree) are left untouched. + """ + git_dir = prior_path / ".git" + if not await asyncio.to_thread(git_dir.exists): + return + try: + await asyncio.to_thread(shutil.rmtree, git_dir) + except OSError as exc: + # Best-effort: surface the failure but let the new + # backend's provision report whatever it sees on disk. + logger.warning( + WORKSPACE_GIT_DIR_CLEARED, + project_id=project_id, + git_dir=str(git_dir), + success=False, + error_type=type(exc).__name__, + error=safe_error_description(exc), + ) + return + logger.info( + WORKSPACE_GIT_DIR_CLEARED, + project_id=project_id, + git_dir=str(git_dir), + success=True, + ) + + async def _lock_for(self, project_id: str) -> asyncio.Lock: + """Return the per-project provisioning lock (created once).""" + async with self._locks_guard: + return self._locks.setdefault(project_id, asyncio.Lock()) + + async def get_or_provision( + self, + project_id: NotBlankStr, + ) -> ProjectWorkspace: + """Return the project's workspace, provisioning it once if absent. + + Concurrent first-touch by two agents provisions exactly once + (per-project lock). A persisted row whose backend kind differs + from the live config is re-provisioned under the new backend. + + Returns: + The persisted :class:`ProjectWorkspace`. + + Raises: + GitBackendProvisionError: Repository provisioning failed. + """ + lock = await self._lock_for(project_id) + async with lock: + row = await self._repo.get(project_id) + kind = self._config.kind + if row is not None and row.git_backend_kind == kind: + logger.info( + PROJECT_WORKSPACE_REUSED, + project_id=project_id, + backend=kind.value, + ) + return row + if row is not None and row.git_backend_kind != kind: + logger.warning( + WORKSPACE_BACKEND_KIND_CHANGED, + project_id=project_id, + from_backend=row.git_backend_kind.value, + to_backend=kind.value, + ) + return await self._provision(project_id, row, kind) + + async def _provision( + self, + project_id: NotBlankStr, + prior: ProjectWorkspace | None, + kind: GitBackendType, + ) -> ProjectWorkspace: + """Provision (or re-provision) and persist the workspace row.""" + # Reuse the persisted on-disk location across re-provisions: this + # avoids relocating the tree if ``_workspace_path()`` ever moves + # in a future refactor, and on a same-kind re-provision keeps the + # path deterministically equal to the prior one. + workspace_path = ( + Path(prior.workspace_path) + if prior is not None + else self._workspace_path(project_id) + ) + await asyncio.to_thread(workspace_path.mkdir, parents=True, exist_ok=True) + if prior is not None and prior.git_backend_kind != kind: + # Kind switch: remove the prior backend's ``.git`` metadata + # so the new backend's ``is_git_repo`` short-circuit cannot + # keep the old layout alive (otherwise EMBEDDED -> EXTERNAL_REMOTE + # would never clone, EMBEDDED -> LOCAL_PATH would never reinit, + # etc., and acceptance #3 stays dead on disk). + await self._clear_prior_git_dir( + project_id=project_id, + prior_path=Path(prior.workspace_path), + ) + default_branch = prior.default_branch if prior is not None else _DEFAULT_BRANCH + result = await self._git_backend.provision( + project_id=project_id, + workspace_path=workspace_path, + default_branch=default_branch, + ) + now = self._clock.now() + remote_ref = ( + NotBlankStr(self._config.remote_connection_name) + if ( + kind is GitBackendType.EXTERNAL_REMOTE + and self._config.remote_connection_name + ) + else None + ) + workspace = ProjectWorkspace( + project_id=project_id, + workspace_path=result.repo_root, + git_backend_kind=kind, + remote_ref=remote_ref, + default_branch=result.default_branch, + created_at=prior.created_at if prior is not None else now, + updated_at=now, + ) + await self._repo.save(workspace) + logger.info( + PROJECT_WORKSPACE_PROVISIONED, + project_id=project_id, + backend=kind.value, + newly_created=result.newly_created, + ) + return workspace + + +__all__ = ["ProjectWorkspaceService"] diff --git a/src/synthorg/engine/workspace/push_queue.py b/src/synthorg/engine/workspace/push_queue.py new file mode 100644 index 0000000000..1e3c48c7de --- /dev/null +++ b/src/synthorg/engine/workspace/push_queue.py @@ -0,0 +1,277 @@ +"""Coordinator-owned serial merge/push queue. + +Worktrees give local isolation; this queue gives forge-collision +safety. N agents finishing concurrently on one project all route +their merge-to-default-branch + push-to-backend through ONE serial +FIFO per project, so concurrent pushes never collide at the git +backend. Pluggable: it sits in front of the +:class:`~synthorg.engine.workspace.protocol.WorkspaceIsolationStrategy` +seam, so a future virtual-branch strategy supplies its own +``merge_workspace`` without changing this queue. +""" + +import asyncio +from pathlib import Path # noqa: TC003 -- runtime annotation (PEP 649) +from typing import TYPE_CHECKING, NamedTuple + +from synthorg.core.clock import Clock, SystemClock +from synthorg.core.types import NotBlankStr # noqa: TC001 +from synthorg.engine.errors import ( + WorkspaceError, + WorkspaceMergeError, + WorkspacePushError, +) +from synthorg.observability import get_logger, safe_error_description +from synthorg.observability.events.workspace import ( + WORKSPACE_PUSH_QUEUE_ENQUEUED, + WORKSPACE_PUSH_QUEUE_FAILED, + WORKSPACE_PUSH_QUEUE_MERGED, + WORKSPACE_PUSH_QUEUE_WORKER_FAILED, +) + +if TYPE_CHECKING: + from synthorg.engine.workspace.git_backend import GitBackend + from synthorg.engine.workspace.models import MergeResult, Workspace + from synthorg.engine.workspace.protocol import WorkspaceIsolationStrategy + +logger = get_logger(__name__) + + +class _QueuedMerge(NamedTuple): + """One queued merge+push request and its result future.""" + + workspace: Workspace + future: asyncio.Future[MergeResult] + + +class PushQueueCoordinator: + """Per-project serial merge+push processor. + + Args: + project_id: Owning project (one coordinator per project). + strategy: Workspace isolation strategy doing the actual merge. + git_backend: Backend the merged default branch is pushed to. + repo_root: Project working tree the push runs from. + default_branch: Branch merged into and pushed. + clock: Clock seam for duration measurement. + """ + + __slots__ = ( + "_clock", + "_closing", + "_default_branch", + "_git_backend", + "_project_id", + "_queue", + "_repo_root", + "_strategy", + "_worker", + ) + + def __init__( # noqa: PLR0913 -- distinct collaborators; all required + self, + *, + project_id: NotBlankStr, + strategy: WorkspaceIsolationStrategy, + git_backend: GitBackend, + repo_root: Path, + default_branch: NotBlankStr, + clock: Clock | None = None, + ) -> None: + self._project_id = project_id + self._strategy = strategy + self._git_backend = git_backend + self._repo_root = repo_root + self._default_branch = default_branch + self._clock: Clock = clock if clock is not None else SystemClock() + # ``asyncio.Queue`` binds to the loop that constructs it; create + # it in ``start()`` so the coordinator can restart on a fresh + # event loop (pytest-asyncio per-test loops, lifecycle restart). + self._queue: asyncio.Queue[_QueuedMerge | None] | None = None + self._worker: asyncio.Task[None] | None = None + # ``stop()`` sets this BEFORE enqueuing the sentinel so a late + # ``enqueue_merge_push()`` racing the shutdown path refuses the + # request instead of appending behind the sentinel (where it + # would hang the caller forever). + self._closing = False + + async def start(self) -> None: + """Start the background queue worker (idempotent).""" + if self._worker is None or self._worker.done(): + self._queue = asyncio.Queue() + self._closing = False + self._worker = asyncio.create_task(self._worker_loop()) + + async def stop(self) -> None: + """Drain in-flight work then stop the worker (idempotent).""" + worker = self._worker + if worker is None: + return + # Refuse any further enqueues BEFORE the sentinel goes in: a + # request that wins the race after the sentinel is enqueued + # would sit forever behind it. + self._closing = True + queue = self._queue + if queue is not None: + await queue.put(None) + try: + await worker + finally: + self._worker = None + self._queue = None + + async def enqueue_merge_push( + self, + *, + workspace: Workspace, + ) -> MergeResult: + """Enqueue a merge+push and await its turn. + + Returns: + The :class:`MergeResult`. A conflicted (``success=False``) + merge is returned WITHOUT pushing. + + Raises: + WorkspaceError: ``stop()`` has started; no new work is accepted. + WorkspaceMergeError: The strategy merge failed fatally. + WorkspacePushError: The backend push failed. + """ + if self._closing: + msg = "PushQueueCoordinator is stopping; new pushes refused" + raise WorkspaceError(msg) + queue = self._queue + if queue is None: + msg = "PushQueueCoordinator: enqueue called before start()" + raise WorkspaceError(msg) + loop = asyncio.get_running_loop() + future: asyncio.Future[MergeResult] = loop.create_future() + await queue.put(_QueuedMerge(workspace=workspace, future=future)) + logger.info( + WORKSPACE_PUSH_QUEUE_ENQUEUED, + project_id=self._project_id, + workspace_id=workspace.workspace_id, + ) + return await future + + async def _worker_loop(self) -> None: + """Process queued merges strictly one at a time (FIFO).""" + queue = self._queue + if queue is None: # pragma: no cover - start() always assigns + return + try: + # lint-allow: long-running-loop-kill-switch -- stop() puts a None sentinel + while True: + item = await queue.get() + if item is None: + return + try: + await self._process(item) + except MemoryError, RecursionError: + raise + except Exception as exc: + # A bug in _process must not kill the worker and strand + # every later caller; surface it to this caller and + # keep draining. + logger.error( + WORKSPACE_PUSH_QUEUE_WORKER_FAILED, + project_id=self._project_id, + error_type=type(exc).__name__, + error=safe_error_description(exc), + ) + if not item.future.done(): + item.future.set_exception(exc) + finally: + # Worker is exiting (sentinel drain OR a re-raised + # MemoryError/RecursionError). Fail every still-pending + # item so their callers do not hang waiting on a future + # the dead worker would never complete. + while not queue.empty(): + pending = queue.get_nowait() + if pending is None: + continue + if not pending.future.done(): + pending.future.set_exception( + WorkspaceError( + f"PushQueueCoordinator for project " + f"{self._project_id!r} stopped with pending work", + ), + ) + + async def _push_default_branch(self, item: _QueuedMerge) -> bool: + """Run the backend push; return ``True`` iff the push succeeded. + + Resolves ``item.future`` on every failure path (preserving any + existing ``WorkspacePushError`` subtype). ``MemoryError`` and + ``RecursionError`` propagate so the worker loop's outer + ``finally`` can fail every remaining pending item. + """ + try: + await self._git_backend.push( + project_id=self._project_id, + repo_root=self._repo_root, + branch=self._default_branch, + base_branch=self._default_branch, + ) + except MemoryError, RecursionError: + raise + except WorkspacePushError as exc: + # Preserve forge-specific push error subtypes (rejection, + # auth failure ...) so callers can discriminate; wrapping + # in a fresh ``WorkspacePushError`` would erase the subclass. + logger.warning( + WORKSPACE_PUSH_QUEUE_FAILED, + project_id=self._project_id, + workspace_id=item.workspace.workspace_id, + error_type=type(exc).__name__, + error=safe_error_description(exc), + ) + if not item.future.done(): + item.future.set_exception(exc) + return False + except Exception as exc: + logger.warning( + WORKSPACE_PUSH_QUEUE_FAILED, + project_id=self._project_id, + workspace_id=item.workspace.workspace_id, + error_type=type(exc).__name__, + error=safe_error_description(exc), + ) + if not item.future.done(): + msg = ( + f"Failed to push merged default branch for project " + f"{self._project_id!r}" + ) + item.future.set_exception(WorkspacePushError(msg)) + return False + return True + + async def _process(self, item: _QueuedMerge) -> None: + """Merge then (on success) push; resolve the caller future.""" + try: + merge_result = await self._strategy.merge_workspace( + workspace=item.workspace, + ) + except MemoryError, RecursionError: + raise + except WorkspaceMergeError as exc: + if not item.future.done(): + item.future.set_exception(exc) + return + if not merge_result.success: + # Conflicted merge: do not push a broken default branch. + if not item.future.done(): + item.future.set_result(merge_result) + return + if not await self._push_default_branch(item): + return + logger.info( + WORKSPACE_PUSH_QUEUE_MERGED, + project_id=self._project_id, + workspace_id=item.workspace.workspace_id, + branch=item.workspace.branch_name, + ) + if not item.future.done(): + item.future.set_result(merge_result) + + +__all__ = ["PushQueueCoordinator"] diff --git a/src/synthorg/engine/workspace/service.py b/src/synthorg/engine/workspace/service.py index 33b03368c2..159a28b23d 100644 --- a/src/synthorg/engine/workspace/service.py +++ b/src/synthorg/engine/workspace/service.py @@ -4,10 +4,13 @@ setup, merge, and teardown for groups of agent workspaces. """ +import asyncio +from pathlib import Path # noqa: TC003 -- runtime annotation (PEP 649) from typing import TYPE_CHECKING from uuid import uuid4 from synthorg.core.clock import Clock, SystemClock +from synthorg.core.types import NotBlankStr from synthorg.engine.errors import ( WorkspaceCleanupError, ) @@ -16,6 +19,7 @@ Workspace, WorkspaceGroupResult, ) +from synthorg.engine.workspace.push_queue import PushQueueCoordinator from synthorg.observability import get_logger, safe_error_description from synthorg.observability.events.workspace import ( WORKSPACE_GROUP_SETUP_COMPLETE, @@ -30,13 +34,16 @@ from synthorg.engine.workspace.config import ( WorkspaceIsolationConfig, ) - from synthorg.engine.workspace.models import WorkspaceRequest + from synthorg.engine.workspace.git_backend import GitBackend + from synthorg.engine.workspace.models import MergeResult, WorkspaceRequest from synthorg.engine.workspace.protocol import ( WorkspaceIsolationStrategy, ) logger = get_logger(__name__) +_DEFAULT_BRANCH: NotBlankStr = NotBlankStr("main") + class WorkspaceIsolationService: """Service for managing workspace isolation lifecycle. @@ -49,18 +56,41 @@ class WorkspaceIsolationService: config: Workspace isolation configuration. """ - __slots__ = ("_clock", "_config", "_merge_orchestrator", "_strategy") + __slots__ = ( + "_clock", + "_config", + "_default_branch", + "_git_backend", + "_merge_orchestrator", + "_push_queues", + "_push_queues_lock", + "_shutting_down", + "_strategy", + ) def __init__( self, *, strategy: WorkspaceIsolationStrategy, config: WorkspaceIsolationConfig, + git_backend: GitBackend | None = None, + default_branch: NotBlankStr = _DEFAULT_BRANCH, clock: Clock | None = None, ) -> None: self._clock: Clock = clock if clock is not None else SystemClock() self._strategy = strategy self._config = config + self._git_backend = git_backend + self._default_branch = default_branch + self._push_queues: dict[str, PushQueueCoordinator] = {} + self._push_queues_lock = asyncio.Lock() + # Set by ``shutdown()`` under ``_push_queues_lock`` so + # ``_get_or_create_queue()`` cannot resurrect a coordinator + # after the service has begun tearing down. Without this flag + # a queue created mid-shutdown would survive the teardown loop + # and keep accepting merge+push work against a service that + # claims to have stopped. + self._shutting_down = False pw = config.planner_worktrees self._merge_orchestrator = MergeOrchestrator( strategy=strategy, @@ -183,6 +213,101 @@ async def merge_group( duration_seconds=elapsed, ) + async def _get_or_create_queue( + self, + *, + project_id: NotBlankStr, + repo_root: Path, + ) -> PushQueueCoordinator: + """Return the project's push queue, creating it once on first use. + + Double-checked under ``_push_queues_lock`` so two agents + finishing concurrently on a fresh project create exactly one + coordinator. + """ + async with self._push_queues_lock: + existing = self._push_queues.get(project_id) + if existing is not None: + return existing + if self._shutting_down: + # Shutdown started before this caller acquired the lock; + # a new queue would survive the teardown loop's snapshot + # and keep accepting work against a stopped service. + msg = ( + f"WorkspaceIsolationService for project " + f"{project_id!r} is shutting down; refusing new queue" + ) + raise WorkspaceCleanupError(msg) + if self._git_backend is None: # pragma: no cover - guarded by caller + msg = "push queue requires a git backend" + raise WorkspaceCleanupError(msg) + queue = PushQueueCoordinator( + project_id=project_id, + strategy=self._strategy, + git_backend=self._git_backend, + repo_root=repo_root, + default_branch=self._default_branch, + clock=self._clock, + ) + await queue.start() + self._push_queues[project_id] = queue + return queue + + async def merge_workspace_with_push( + self, + *, + workspace: Workspace, + project_id: NotBlankStr, + repo_root: Path, + ) -> MergeResult: + """Merge *workspace* then push the default branch, serialised. + + When no git backend is wired the merge still runs (via the + strategy) but nothing is pushed -- this keeps the call site + uniform whether or not durable backing is configured. + + Args: + workspace: The agent workspace to merge back. + project_id: Owning project (selects the serial queue). + repo_root: Project working tree the push runs from. + + Returns: + The :class:`MergeResult`. + + Raises: + WorkspaceMergeError: The merge failed fatally. + WorkspacePushError: The backend push failed. + """ + if self._git_backend is None: + return await self._strategy.merge_workspace(workspace=workspace) + queue = await self._get_or_create_queue( + project_id=project_id, + repo_root=repo_root, + ) + return await queue.enqueue_merge_push(workspace=workspace) + + async def shutdown(self) -> None: + """Stop every per-project push queue (best-effort, all attempted).""" + async with self._push_queues_lock: + # Flip ``_shutting_down`` under the same lock that + # ``_get_or_create_queue`` takes so a concurrent first-touch + # cannot slip a freshly-created coordinator in behind us. + self._shutting_down = True + queues = tuple(self._push_queues.values()) + self._push_queues.clear() + for queue in queues: + try: + await queue.stop() + except MemoryError, RecursionError: + raise + except Exception as exc: + logger.warning( + WORKSPACE_TEARDOWN_FAILED, + reason="push_queue_stop_failed", + error_type=type(exc).__name__, + error=safe_error_description(exc), + ) + async def teardown_group( self, *, diff --git a/src/synthorg/integrations/connections/models.py b/src/synthorg/integrations/connections/models.py index d54c7fbda4..e8fd2e410c 100644 --- a/src/synthorg/integrations/connections/models.py +++ b/src/synthorg/integrations/connections/models.py @@ -36,6 +36,9 @@ class ConnectionType(StrEnum): """Supported external service connection types.""" GITHUB = "github" + GITLAB = "gitlab" + GITEA = "gitea" + FORGEJO = "forgejo" SLACK = "slack" SMTP = "smtp" DATABASE = "database" diff --git a/src/synthorg/integrations/connections/types/__init__.py b/src/synthorg/integrations/connections/types/__init__.py index 556c171b9b..37c52e2af4 100644 --- a/src/synthorg/integrations/connections/types/__init__.py +++ b/src/synthorg/integrations/connections/types/__init__.py @@ -19,7 +19,12 @@ from synthorg.integrations.connections.types.generic_http import ( GenericHttpAuthenticator, ) +from synthorg.integrations.connections.types.gitea import ( + ForgejoAuthenticator, + GiteaAuthenticator, +) from synthorg.integrations.connections.types.github import GitHubAuthenticator +from synthorg.integrations.connections.types.gitlab import GitLabAuthenticator from synthorg.integrations.connections.types.oauth_app import ( OAuthAppAuthenticator, ) @@ -36,6 +41,9 @@ MappingProxyType( { ConnectionType.GITHUB: GitHubAuthenticator(), + ConnectionType.GITLAB: GitLabAuthenticator(), + ConnectionType.GITEA: GiteaAuthenticator(), + ConnectionType.FORGEJO: ForgejoAuthenticator(), ConnectionType.SLACK: SlackAuthenticator(), ConnectionType.SMTP: SmtpAuthenticator(), ConnectionType.DATABASE: DatabaseAuthenticator(), diff --git a/src/synthorg/integrations/connections/types/gitea.py b/src/synthorg/integrations/connections/types/gitea.py new file mode 100644 index 0000000000..defc734f2e --- /dev/null +++ b/src/synthorg/integrations/connections/types/gitea.py @@ -0,0 +1,75 @@ +"""Gitea / Forgejo connection types. + +Forgejo is a hard-fork of Gitea that today retains Gitea's API and +token-auth model. They are modelled as SEPARATE connection identities +(separate saved-connection rows + icons) so a future API divergence +never invalidates a user's existing saved connection, but they share +the credential-validation code via :class:`_GiteaFamilyAuthenticator`. +When the APIs actually diverge, split the subclass bodies; the +connection identities and saved rows stay unchanged. +""" + +from synthorg.integrations.connections.models import ConnectionType +from synthorg.integrations.errors import InvalidConnectionAuthError +from synthorg.observability import get_logger +from synthorg.observability.events.integrations import ( + CONNECTION_VALIDATION_FAILED, +) + +logger = get_logger(__name__) + + +class _GiteaFamilyAuthenticator: + """Shared token-auth validation for the Gitea/Forgejo family. + + Subclasses set :attr:`connection_type`; the validation surface is + identical while the two forges remain API/token compatible. + + Required fields: ``token`` (personal access token). + Optional fields: ``api_url`` (self-hosted base URL). + """ + + @property + def connection_type(self) -> ConnectionType: # pragma: no cover - overridden + raise NotImplementedError + + def validate_credentials( + self, + credentials: dict[str, str], + ) -> None: + """Validate credential fields.""" + token = credentials.get("token") + if not isinstance(token, str) or not token.strip(): + logger.warning( + CONNECTION_VALIDATION_FAILED, + connection_type=self.connection_type.value, + field="token", + error="missing, non-string, or blank", + ) + msg = ( + f"{self.connection_type.value.capitalize()} connection " + "requires a 'token' field" + ) + raise InvalidConnectionAuthError(msg) + + def required_fields(self) -> tuple[str, ...]: + """Return required credential field names.""" + return ("token",) + + +class GiteaAuthenticator(_GiteaFamilyAuthenticator): + """Validates Gitea connection credentials.""" + + @property + def connection_type(self) -> ConnectionType: + """The connection type this authenticator handles.""" + return ConnectionType.GITEA + + +class ForgejoAuthenticator(_GiteaFamilyAuthenticator): + """Validates Forgejo connection credentials.""" + + @property + def connection_type(self) -> ConnectionType: + """The connection type this authenticator handles.""" + return ConnectionType.FORGEJO diff --git a/src/synthorg/integrations/connections/types/gitlab.py b/src/synthorg/integrations/connections/types/gitlab.py new file mode 100644 index 0000000000..31ea3001a6 --- /dev/null +++ b/src/synthorg/integrations/connections/types/gitlab.py @@ -0,0 +1,43 @@ +"""GitLab connection type.""" + +from synthorg.integrations.connections.models import ConnectionType +from synthorg.integrations.errors import InvalidConnectionAuthError +from synthorg.observability import get_logger +from synthorg.observability.events.integrations import ( + CONNECTION_VALIDATION_FAILED, +) + +logger = get_logger(__name__) + + +class GitLabAuthenticator: + """Validates GitLab connection credentials. + + Required fields: ``token`` (personal / project / group access token). + Optional fields: ``api_url`` (defaults to https://gitlab.com). + """ + + @property + def connection_type(self) -> ConnectionType: + """The connection type this authenticator handles.""" + return ConnectionType.GITLAB + + def validate_credentials( + self, + credentials: dict[str, str], + ) -> None: + """Validate credential fields.""" + token = credentials.get("token") + if not isinstance(token, str) or not token.strip(): + logger.warning( + CONNECTION_VALIDATION_FAILED, + connection_type=ConnectionType.GITLAB.value, + field="token", + error="missing, non-string, or blank", + ) + msg = "GitLab connection requires a 'token' field" + raise InvalidConnectionAuthError(msg) + + def required_fields(self) -> tuple[str, ...]: + """Return required credential field names.""" + return ("token",) diff --git a/src/synthorg/observability/events/persistence.py b/src/synthorg/observability/events/persistence.py index f90bd19ed3..1e26c2d569 100644 --- a/src/synthorg/observability/events/persistence.py +++ b/src/synthorg/observability/events/persistence.py @@ -297,6 +297,29 @@ "persistence.project.deserialize_failed" ) +# Project workspace events +PERSISTENCE_PROJECT_WORKSPACE_SAVE_FAILED: Final[str] = ( + "persistence.project_workspace.save_failed" +) +PERSISTENCE_PROJECT_WORKSPACE_FETCHED: Final[str] = ( + "persistence.project_workspace.fetched" +) +PERSISTENCE_PROJECT_WORKSPACE_FETCH_FAILED: Final[str] = ( + "persistence.project_workspace.fetch_failed" +) +PERSISTENCE_PROJECT_WORKSPACE_LISTED: Final[str] = ( + "persistence.project_workspace.listed" +) +PERSISTENCE_PROJECT_WORKSPACE_LIST_FAILED: Final[str] = ( + "persistence.project_workspace.list_failed" +) +PERSISTENCE_PROJECT_WORKSPACE_DELETE_FAILED: Final[str] = ( + "persistence.project_workspace.delete_failed" +) +PERSISTENCE_PROJECT_WORKSPACE_DESERIALIZE_FAILED: Final[str] = ( + "persistence.project_workspace.deserialize_failed" +) + # -- Project cost aggregate events -------------------------------------------- PERSISTENCE_PROJECT_COST_AGG_INCREMENTED: Final[str] = ( diff --git a/src/synthorg/observability/events/workspace.py b/src/synthorg/observability/events/workspace.py index 58e5125a47..005922ec26 100644 --- a/src/synthorg/observability/events/workspace.py +++ b/src/synthorg/observability/events/workspace.py @@ -38,3 +38,27 @@ WORKSPACE_DISK_CHECK_ERROR: Final[str] = "workspace.disk.check.error" WORKSPACE_CONFIG_INVALID: Final[str] = "workspace.config.invalid" + +# ── Git backend events ─────────────────────────────────────────── +GIT_BACKEND_PROVISION_START: Final[str] = "git_backend.provision.start" +GIT_BACKEND_PROVISION_COMPLETE: Final[str] = "git_backend.provision.complete" +GIT_BACKEND_PROVISION_FAILED: Final[str] = "git_backend.provision.failed" +GIT_BACKEND_PUSH_COMPLETE: Final[str] = "git_backend.push.complete" +GIT_BACKEND_PUSH_FAILED: Final[str] = "git_backend.push.failed" +GIT_BACKEND_FETCH_COMPLETE: Final[str] = "git_backend.fetch.complete" +GIT_BACKEND_FETCH_FAILED: Final[str] = "git_backend.fetch.failed" + +# ── Project workspace provisioning events ──────────────────────── +PROJECT_WORKSPACE_PROVISIONED: Final[str] = "project_workspace.provisioned" +PROJECT_WORKSPACE_REUSED: Final[str] = "project_workspace.reused" +WORKSPACE_BACKEND_KIND_CHANGED: Final[str] = "workspace.backend_kind.changed" +WORKSPACE_PATH_TRAVERSAL_REJECTED: Final[str] = ( + "project_workspace.path.traversal.rejected" +) +WORKSPACE_GIT_DIR_CLEARED: Final[str] = "project_workspace.git_dir.cleared" + +# ── Coordinator push-queue events ──────────────────────────────── +WORKSPACE_PUSH_QUEUE_ENQUEUED: Final[str] = "workspace.push_queue.enqueued" +WORKSPACE_PUSH_QUEUE_MERGED: Final[str] = "workspace.push_queue.merged" +WORKSPACE_PUSH_QUEUE_FAILED: Final[str] = "workspace.push_queue.failed" +WORKSPACE_PUSH_QUEUE_WORKER_FAILED: Final[str] = "workspace.push_queue.worker_failed" diff --git a/src/synthorg/persistence/postgres/backend.py b/src/synthorg/persistence/postgres/backend.py index 817232d042..8afb8a9f53 100644 --- a/src/synthorg/persistence/postgres/backend.py +++ b/src/synthorg/persistence/postgres/backend.py @@ -118,6 +118,9 @@ PostgresProjectCostAggregateRepository, ) from synthorg.persistence.postgres.project_repo import PostgresProjectRepository +from synthorg.persistence.postgres.project_workspace_repo import ( + PostgresProjectWorkspaceRepository, +) from synthorg.persistence.postgres.provider_audit_repo import ( PostgresProviderAuditRepo, ) @@ -231,6 +234,9 @@ ProjectCostAggregateRepository, ) from synthorg.persistence.project_protocol import ProjectRepository + from synthorg.persistence.project_workspace_protocol import ( + ProjectWorkspaceRepository, + ) from synthorg.persistence.provider_audit_protocol import ProviderAuditRepo from synthorg.persistence.risk_override_protocol import RiskOverrideRepository from synthorg.persistence.seen_claims_protocol import SeenClaimsRepository @@ -282,6 +288,7 @@ def __init__(self, config: PostgresConfig) -> None: # noqa: PLR0915 -- repo reg # Repository attributes -- instantiated lazily on connect. self._artifacts: ArtifactRepository | None = None self._projects: ProjectRepository | None = None + self._project_workspaces: ProjectWorkspaceRepository | None = None self._tasks: TaskRepository | None = None self._cost_records: CostRecordRepository | None = None self._messages: MessageRepository | None = None @@ -344,6 +351,7 @@ def _clear_state(self) -> None: # noqa: PLR0915 -- repo registry reset intentio self._pool = None self._artifacts = None self._projects = None + self._project_workspaces = None self._tasks = None self._cost_records = None self._messages = None @@ -405,6 +413,7 @@ def _create_repositories(self) -> None: # noqa: PLR0915 # Core domain repositories. self._artifacts = PostgresArtifactRepository(pool) self._projects = PostgresProjectRepository(pool) + self._project_workspaces = PostgresProjectWorkspaceRepository(pool) self._tasks = PostgresTaskRepository(pool) self._cost_records = PostgresCostRecordRepository(pool) self._messages = PostgresMessageRepository(pool) @@ -662,6 +671,11 @@ def projects(self) -> ProjectRepository: """Repository for Project persistence.""" return self._require_connected(self._projects, "projects") + @property + def project_workspaces(self) -> ProjectWorkspaceRepository: + """Repository for persistent per-project workspace mappings.""" + return self._require_connected(self._project_workspaces, "project_workspaces") + @property def custom_presets(self) -> PersonalityPresetRepository: """Repository for custom personality preset persistence.""" diff --git a/src/synthorg/persistence/postgres/project_workspace_repo.py b/src/synthorg/persistence/postgres/project_workspace_repo.py new file mode 100644 index 0000000000..08c0a95422 --- /dev/null +++ b/src/synthorg/persistence/postgres/project_workspace_repo.py @@ -0,0 +1,206 @@ +"""Postgres repository implementation for ProjectWorkspace.""" + +from typing import TYPE_CHECKING, Any + +import psycopg +from psycopg.rows import dict_row +from pydantic import ValidationError + +from synthorg.core.enums import GitBackendType +from synthorg.core.persistence_errors import QueryError +from synthorg.core.project_workspace import ProjectWorkspace +from synthorg.core.types import NotBlankStr # noqa: TC001 +from synthorg.observability import get_logger, safe_error_description +from synthorg.observability.events.persistence import ( + PERSISTENCE_PROJECT_WORKSPACE_DELETE_FAILED, + PERSISTENCE_PROJECT_WORKSPACE_DESERIALIZE_FAILED, + PERSISTENCE_PROJECT_WORKSPACE_FETCH_FAILED, + PERSISTENCE_PROJECT_WORKSPACE_FETCHED, + PERSISTENCE_PROJECT_WORKSPACE_LIST_FAILED, + PERSISTENCE_PROJECT_WORKSPACE_LISTED, + PERSISTENCE_PROJECT_WORKSPACE_SAVE_FAILED, +) +from synthorg.persistence._generics import DEFAULT_PAGE_SIZE +from synthorg.persistence._shared import coerce_row_timestamp +from synthorg.persistence._shared.pagination import validate_pagination_args + +if TYPE_CHECKING: + from psycopg_pool import AsyncConnectionPool + +logger = get_logger(__name__) + +_MAX_LIST_ROWS: int = 10_000 + + +def _row_to_workspace(row: dict[str, Any]) -> ProjectWorkspace: + """Reconstruct a ``ProjectWorkspace`` from a Postgres dict_row.""" + data = dict(row) + data["git_backend_kind"] = GitBackendType(data["git_backend_kind"]) + data["created_at"] = coerce_row_timestamp(data["created_at"]) + data["updated_at"] = coerce_row_timestamp(data["updated_at"]) + return ProjectWorkspace.model_validate(data) + + +class PostgresProjectWorkspaceRepository: + """Postgres-backed project workspace repository. + + Args: + pool: An open psycopg_pool.AsyncConnectionPool. + """ + + def __init__(self, pool: AsyncConnectionPool) -> None: + self._pool = pool + + @staticmethod + def _row_params(workspace: ProjectWorkspace) -> tuple[object, ...]: + return ( + workspace.project_id, + workspace.workspace_path, + workspace.git_backend_kind.value, + workspace.remote_ref, + workspace.default_branch, + workspace.created_at, + workspace.updated_at, + ) + + async def save(self, entity: ProjectWorkspace) -> None: + """Persist a project workspace via upsert (insert or update).""" + try: + async with self._pool.connection() as conn, conn.cursor() as cur: + await cur.execute( + """ + INSERT INTO project_workspaces (project_id, workspace_path, + git_backend_kind, remote_ref, + default_branch, created_at, + updated_at) + VALUES (%s, %s, %s, %s, %s, %s, %s) + ON CONFLICT(project_id) DO UPDATE SET + workspace_path=EXCLUDED.workspace_path, + git_backend_kind=EXCLUDED.git_backend_kind, + remote_ref=EXCLUDED.remote_ref, + default_branch=EXCLUDED.default_branch, + created_at=EXCLUDED.created_at, + updated_at=EXCLUDED.updated_at + """, + self._row_params(entity), + ) + await conn.commit() + except psycopg.Error as exc: + msg = f"Failed to save project workspace {entity.project_id!r}" + logger.warning( + PERSISTENCE_PROJECT_WORKSPACE_SAVE_FAILED, + project_id=entity.project_id, + error_type=type(exc).__name__, + error=safe_error_description(exc), + ) + raise QueryError(msg) from exc + + async def get(self, entity_id: NotBlankStr) -> ProjectWorkspace | None: + """Retrieve a project workspace by owning project id.""" + try: + async with ( + self._pool.connection() as conn, + conn.cursor(row_factory=dict_row) as cur, + ): + await cur.execute( + "SELECT * FROM project_workspaces WHERE project_id = %s", + (entity_id,), + ) + row = await cur.fetchone() + except psycopg.Error as exc: + msg = f"Failed to fetch project workspace {entity_id!r}" + logger.warning( + PERSISTENCE_PROJECT_WORKSPACE_FETCH_FAILED, + project_id=entity_id, + error_type=type(exc).__name__, + error=safe_error_description(exc), + ) + raise QueryError(msg) from exc + if row is None: + logger.debug( + PERSISTENCE_PROJECT_WORKSPACE_FETCHED, + project_id=entity_id, + found=False, + ) + return None + try: + workspace = _row_to_workspace(row) + except (ValueError, ValidationError, KeyError) as exc: + msg = f"Failed to deserialize project workspace {entity_id!r}" + logger.warning( + PERSISTENCE_PROJECT_WORKSPACE_DESERIALIZE_FAILED, + project_id=entity_id, + error_type=type(exc).__name__, + error=safe_error_description(exc), + ) + raise QueryError(msg) from exc + logger.debug( + PERSISTENCE_PROJECT_WORKSPACE_FETCHED, + project_id=entity_id, + found=True, + ) + return workspace + + async def list_items( + self, + *, + limit: int = DEFAULT_PAGE_SIZE, + offset: int = 0, + ) -> tuple[ProjectWorkspace, ...]: + """List workspaces in project-id order.""" + limit = validate_pagination_args( + limit, offset, event=PERSISTENCE_PROJECT_WORKSPACE_LIST_FAILED + ) + effective_limit = min(limit, _MAX_LIST_ROWS) + try: + async with ( + self._pool.connection() as conn, + conn.cursor(row_factory=dict_row) as cur, + ): + await cur.execute( + "SELECT * FROM project_workspaces " + "ORDER BY project_id LIMIT %s OFFSET %s", + (effective_limit, offset), + ) + rows = await cur.fetchall() + except psycopg.Error as exc: + msg = "Failed to list project workspaces" + logger.warning( + PERSISTENCE_PROJECT_WORKSPACE_LIST_FAILED, + error_type=type(exc).__name__, + error=safe_error_description(exc), + ) + raise QueryError(msg) from exc + try: + workspaces = tuple(_row_to_workspace(row) for row in rows) + except (ValueError, ValidationError, KeyError) as exc: + msg = "Failed to deserialize project workspaces" + logger.warning( + PERSISTENCE_PROJECT_WORKSPACE_DESERIALIZE_FAILED, + error_type=type(exc).__name__, + error=safe_error_description(exc), + ) + raise QueryError(msg) from exc + logger.debug(PERSISTENCE_PROJECT_WORKSPACE_LISTED, count=len(workspaces)) + return workspaces + + async def delete(self, entity_id: NotBlankStr) -> bool: + """Delete a project workspace by owning project id.""" + try: + async with self._pool.connection() as conn, conn.cursor() as cur: + await cur.execute( + "DELETE FROM project_workspaces WHERE project_id = %s", + (entity_id,), + ) + deleted = cur.rowcount > 0 + await conn.commit() + except psycopg.Error as exc: + msg = f"Failed to delete project workspace {entity_id!r}" + logger.warning( + PERSISTENCE_PROJECT_WORKSPACE_DELETE_FAILED, + project_id=entity_id, + error_type=type(exc).__name__, + error=safe_error_description(exc), + ) + raise QueryError(msg) from exc + return deleted diff --git a/src/synthorg/persistence/postgres/revisions/20260519000001_project_workspaces.sql b/src/synthorg/persistence/postgres/revisions/20260519000001_project_workspaces.sql new file mode 100644 index 0000000000..e9ec954272 --- /dev/null +++ b/src/synthorg/persistence/postgres/revisions/20260519000001_project_workspaces.sql @@ -0,0 +1,24 @@ +-- depends: 20260518000001_approval_source + +-- Persistent per-project workspace mapping (#1974). One row per project +-- (1:1), recording where the git-backed working tree lives on the +-- persistent volume and which backend provisioned it, so a session +-- restart re-locates the same directory and a configured backend switch +-- is detectable against the persisted kind. ON DELETE CASCADE: deleting +-- a project drops its workspace mapping (the on-disk tree is reclaimed +-- separately by the workspace service). + +CREATE TABLE project_workspaces ( + project_id TEXT NOT NULL PRIMARY KEY, + workspace_path TEXT NOT NULL UNIQUE, + git_backend_kind TEXT NOT NULL + CHECK (git_backend_kind IN ('embedded', 'external_remote', 'local_path')), + remote_ref TEXT, + default_branch TEXT NOT NULL DEFAULT 'main', + created_at TIMESTAMPTZ NOT NULL, + updated_at TIMESTAMPTZ NOT NULL, + FOREIGN KEY (project_id) REFERENCES projects(id) ON DELETE CASCADE +); + +CREATE INDEX idx_project_workspaces_created_at + ON project_workspaces(created_at); diff --git a/src/synthorg/persistence/postgres/schema.sql b/src/synthorg/persistence/postgres/schema.sql index 3ec71728e5..0a23f8a43b 100644 --- a/src/synthorg/persistence/postgres/schema.sql +++ b/src/synthorg/persistence/postgres/schema.sql @@ -450,6 +450,22 @@ CREATE TABLE projects ( CREATE INDEX idx_projects_status ON projects(status); CREATE INDEX idx_projects_lead ON projects(lead); +-- ── Persistent per-project workspace (1:1 with projects) ───── +CREATE TABLE project_workspaces ( + project_id TEXT NOT NULL PRIMARY KEY, + workspace_path TEXT NOT NULL UNIQUE, + git_backend_kind TEXT NOT NULL + CHECK (git_backend_kind IN ('embedded', 'external_remote', 'local_path')), + remote_ref TEXT, + default_branch TEXT NOT NULL DEFAULT 'main', + created_at TIMESTAMPTZ NOT NULL, + updated_at TIMESTAMPTZ NOT NULL, + FOREIGN KEY (project_id) REFERENCES projects(id) ON DELETE CASCADE +); + +CREATE INDEX idx_project_workspaces_created_at + ON project_workspaces(created_at); + -- ── Project-lifetime cost aggregates ───────────────────────── CREATE TABLE project_cost_aggregates ( project_id TEXT NOT NULL PRIMARY KEY CHECK (length(project_id) > 0), diff --git a/src/synthorg/persistence/project_workspace_protocol.py b/src/synthorg/persistence/project_workspace_protocol.py new file mode 100644 index 0000000000..1d7f044312 --- /dev/null +++ b/src/synthorg/persistence/project_workspace_protocol.py @@ -0,0 +1,87 @@ +"""Project workspace repository protocol.""" + +from typing import Protocol, runtime_checkable + +from synthorg.core.project_workspace import ProjectWorkspace +from synthorg.core.types import NotBlankStr +from synthorg.persistence._generics import ( + DEFAULT_PAGE_SIZE, + IdKeyedRepository, +) + + +@runtime_checkable +class ProjectWorkspaceRepository( + IdKeyedRepository[ProjectWorkspace, NotBlankStr], + Protocol, +): + """CRUD interface for :class:`ProjectWorkspace` persistence. + + Keyed 1:1 by ``project_id``. The surface is intentionally minimal + (no filtered-query dimension): a workspace is always resolved by its + owning project, never queried by attribute. ``save`` is an upsert + because :class:`ProjectWorkspaceService` re-provisions in place on a + configured backend switch (no separate create/update audit split is + needed; provisioning is not an end-user CRUD surface). + """ + + async def save(self, entity: ProjectWorkspace) -> None: + """Persist a project workspace via upsert (insert or update). + + Args: + entity: The workspace row to persist. ``entity.project_id`` + is the primary key. + + Raises: + QueryError: If the database operation fails. + """ + ... + + async def get(self, entity_id: NotBlankStr) -> ProjectWorkspace | None: + """Retrieve a project workspace by owning project id. + + Args: + entity_id: The owning project identifier. + + Returns: + The workspace, or ``None`` if the project has none. + + Raises: + QueryError: If the query or deserialization fails. + """ + ... + + async def list_items( + self, + *, + limit: int = DEFAULT_PAGE_SIZE, + offset: int = 0, + ) -> tuple[ProjectWorkspace, ...]: + """List workspaces in project-id order. + + Args: + limit: Maximum rows to return. + offset: Rows to skip from the head of the ordering. + + Returns: + Workspaces in ascending ``project_id`` order, capped at + *limit* rows. + + Raises: + QueryError: If the query or deserialization fails. + """ + ... + + async def delete(self, entity_id: NotBlankStr) -> bool: + """Delete a project workspace by owning project id. + + Args: + entity_id: The owning project identifier. + + Returns: + ``True`` if a row was deleted, ``False`` if not found. + + Raises: + QueryError: If the database operation fails. + """ + ... diff --git a/src/synthorg/persistence/protocol.py b/src/synthorg/persistence/protocol.py index 92cd7c84dc..cbca318282 100644 --- a/src/synthorg/persistence/protocol.py +++ b/src/synthorg/persistence/protocol.py @@ -104,6 +104,9 @@ ProjectCostAggregateRepository, # noqa: TC001 ) from synthorg.persistence.project_protocol import ProjectRepository # noqa: TC001 +from synthorg.persistence.project_workspace_protocol import ( # noqa: TC001 + ProjectWorkspaceRepository, +) from synthorg.persistence.provider_audit_protocol import ( # noqa: TC001 ProviderAuditRepo, ) @@ -406,6 +409,11 @@ def projects(self) -> ProjectRepository: """Repository for Project persistence.""" ... + @property + def project_workspaces(self) -> ProjectWorkspaceRepository: + """Repository for persistent per-project workspace mappings.""" + ... + @property def custom_presets(self) -> PersonalityPresetRepository: """Repository for custom personality preset persistence.""" diff --git a/src/synthorg/persistence/sqlite/_backend_accessors.py b/src/synthorg/persistence/sqlite/_backend_accessors.py index 3717bdfd33..653ebd293a 100644 --- a/src/synthorg/persistence/sqlite/_backend_accessors.py +++ b/src/synthorg/persistence/sqlite/_backend_accessors.py @@ -82,6 +82,9 @@ ProjectCostAggregateRepository, ) from synthorg.persistence.project_protocol import ProjectRepository + from synthorg.persistence.project_workspace_protocol import ( + ProjectWorkspaceRepository, + ) from synthorg.persistence.provider_audit_protocol import ProviderAuditRepo from synthorg.persistence.risk_override_protocol import RiskOverrideRepository from synthorg.persistence.seen_claims_protocol import SeenClaimsRepository @@ -140,6 +143,7 @@ class _BackendRepositoryAccessors: _settings: SettingsRepository | None _artifacts: ArtifactRepository | None _projects: ProjectRepository | None + _project_workspaces: ProjectWorkspaceRepository | None _project_cost_aggregates: ProjectCostAggregateRepository | None _fine_tune_checkpoints: FineTuneCheckpointRepository | None _fine_tune_runs: FineTuneRunRepository | None @@ -300,6 +304,11 @@ def projects(self) -> ProjectRepository: """Repository for Project persistence.""" return self._require_connected(self._projects, "projects") + @property + def project_workspaces(self) -> ProjectWorkspaceRepository: + """Repository for persistent per-project workspace mappings.""" + return self._require_connected(self._project_workspaces, "project_workspaces") + @property def project_cost_aggregates(self) -> ProjectCostAggregateRepository: """Repository for durable project cost aggregates.""" diff --git a/src/synthorg/persistence/sqlite/backend.py b/src/synthorg/persistence/sqlite/backend.py index 175bb3b577..939d5821a2 100644 --- a/src/synthorg/persistence/sqlite/backend.py +++ b/src/synthorg/persistence/sqlite/backend.py @@ -117,6 +117,9 @@ from synthorg.persistence.sqlite.project_repo import ( SQLiteProjectRepository, ) +from synthorg.persistence.sqlite.project_workspace_repo import ( + SQLiteProjectWorkspaceRepository, +) from synthorg.persistence.sqlite.provider_audit_repo import ( SQLiteProviderAuditRepo, ) @@ -202,6 +205,7 @@ def __init__(self, config: SQLiteConfig) -> None: # noqa: PLR0915 -- repo regis self._db: aiosqlite.Connection | None = None self._artifacts: SQLiteArtifactRepository | None = None self._projects: SQLiteProjectRepository | None = None + self._project_workspaces: SQLiteProjectWorkspaceRepository | None = None self._tasks: SQLiteTaskRepository | None = None self._cost_records: SQLiteCostRecordRepository | None = None self._messages: SQLiteMessageRepository | None = None @@ -270,6 +274,7 @@ def _clear_state(self) -> None: # noqa: PLR0915 -- repo registry reset intentio self._db = None self._artifacts = None self._projects = None + self._project_workspaces = None self._tasks = None self._cost_records = None self._messages = None @@ -425,6 +430,10 @@ def _create_repositories(self) -> None: # noqa: PLR0915 self._db, write_context=self.write_context, ) + self._project_workspaces = SQLiteProjectWorkspaceRepository( + self._db, + write_context=self.write_context, + ) self._tasks = SQLiteTaskRepository( self._db, write_context=self.write_context, diff --git a/src/synthorg/persistence/sqlite/project_workspace_repo.py b/src/synthorg/persistence/sqlite/project_workspace_repo.py new file mode 100644 index 0000000000..dc549b7e96 --- /dev/null +++ b/src/synthorg/persistence/sqlite/project_workspace_repo.py @@ -0,0 +1,227 @@ +"""SQLite repository implementation for ProjectWorkspace.""" + +import sqlite3 +from typing import TYPE_CHECKING + +import aiosqlite +from pydantic import ValidationError + +from synthorg.core.enums import GitBackendType +from synthorg.core.persistence_errors import QueryError +from synthorg.core.project_workspace import ProjectWorkspace +from synthorg.core.types import NotBlankStr # noqa: TC001 +from synthorg.observability import get_logger, safe_error_description +from synthorg.observability.events.persistence import ( + PERSISTENCE_PROJECT_WORKSPACE_DELETE_FAILED, + PERSISTENCE_PROJECT_WORKSPACE_DESERIALIZE_FAILED, + PERSISTENCE_PROJECT_WORKSPACE_FETCH_FAILED, + PERSISTENCE_PROJECT_WORKSPACE_FETCHED, + PERSISTENCE_PROJECT_WORKSPACE_LIST_FAILED, + PERSISTENCE_PROJECT_WORKSPACE_LISTED, + PERSISTENCE_PROJECT_WORKSPACE_SAVE_FAILED, +) +from synthorg.persistence._generics import DEFAULT_PAGE_SIZE +from synthorg.persistence._shared import coerce_row_timestamp, format_iso_utc +from synthorg.persistence._shared.pagination import validate_pagination_args + +if TYPE_CHECKING: + from synthorg.persistence.sqlite._shared import WriteContext + +logger = get_logger(__name__) + +_MAX_LIST_ROWS: int = 10_000 + + +def _row_to_workspace(row: aiosqlite.Row) -> ProjectWorkspace: + """Reconstruct a ``ProjectWorkspace`` from a database row.""" + data = dict(row) + data["git_backend_kind"] = GitBackendType(data["git_backend_kind"]) + data["created_at"] = coerce_row_timestamp(data["created_at"]) + data["updated_at"] = coerce_row_timestamp(data["updated_at"]) + return ProjectWorkspace.model_validate(data) + + +class SQLiteProjectWorkspaceRepository: + """SQLite-backed project workspace repository. + + Args: + db: An open aiosqlite connection with ``row_factory`` set to + ``aiosqlite.Row``. + write_context: Async context manager that serializes writes on + the shared connection. + """ + + def __init__( + self, + db: aiosqlite.Connection, + *, + write_context: WriteContext, + ) -> None: + self._db = db + self._write_context = write_context + + @staticmethod + def _row_params(workspace: ProjectWorkspace) -> tuple[object, ...]: + return ( + workspace.project_id, + workspace.workspace_path, + workspace.git_backend_kind.value, + workspace.remote_ref, + workspace.default_branch, + format_iso_utc(workspace.created_at), + format_iso_utc(workspace.updated_at), + ) + + async def _safe_rollback(self, *, event: str) -> None: + """Best-effort rollback on the shared connection. + + The ``event`` parameter identifies the originating call site + (save vs delete) so rollback failures log the right event; + without it, delete-path rollback failures would mislabel as + save failures. + """ + try: + await self._db.rollback() + except (sqlite3.Error, aiosqlite.Error) as rollback_exc: + logger.warning( + event, + error_type=type(rollback_exc).__name__, + error=safe_error_description(rollback_exc), + rollback_failed=True, + ) + + async def save(self, entity: ProjectWorkspace) -> None: + """Persist a project workspace via upsert (insert or update).""" + async with self._write_context(): + try: + await self._db.execute( + """\ +INSERT INTO project_workspaces (project_id, workspace_path, + git_backend_kind, remote_ref, + default_branch, created_at, updated_at) +VALUES (?, ?, ?, ?, ?, ?, ?) +ON CONFLICT(project_id) DO UPDATE SET + workspace_path=excluded.workspace_path, + git_backend_kind=excluded.git_backend_kind, + remote_ref=excluded.remote_ref, + default_branch=excluded.default_branch, + created_at=excluded.created_at, + updated_at=excluded.updated_at""", + self._row_params(entity), + ) + await self._db.commit() + except (sqlite3.Error, aiosqlite.Error) as exc: + await self._safe_rollback( + event=PERSISTENCE_PROJECT_WORKSPACE_SAVE_FAILED + ) + msg = f"Failed to save project workspace {entity.project_id!r}" + logger.warning( + PERSISTENCE_PROJECT_WORKSPACE_SAVE_FAILED, + project_id=entity.project_id, + error_type=type(exc).__name__, + error=safe_error_description(exc), + ) + raise QueryError(msg) from exc + + async def get(self, entity_id: NotBlankStr) -> ProjectWorkspace | None: + """Retrieve a project workspace by owning project id.""" + try: + cursor = await self._db.execute( + "SELECT * FROM project_workspaces WHERE project_id = ?", + (entity_id,), + ) + row = await cursor.fetchone() + except (sqlite3.Error, aiosqlite.Error) as exc: + msg = f"Failed to fetch project workspace {entity_id!r}" + logger.warning( + PERSISTENCE_PROJECT_WORKSPACE_FETCH_FAILED, + project_id=entity_id, + error_type=type(exc).__name__, + error=safe_error_description(exc), + ) + raise QueryError(msg) from exc + if row is None: + logger.debug( + PERSISTENCE_PROJECT_WORKSPACE_FETCHED, + project_id=entity_id, + found=False, + ) + return None + try: + workspace = _row_to_workspace(row) + except (ValueError, ValidationError, KeyError) as exc: + msg = f"Failed to deserialize project workspace {entity_id!r}" + logger.warning( + PERSISTENCE_PROJECT_WORKSPACE_DESERIALIZE_FAILED, + project_id=entity_id, + error_type=type(exc).__name__, + error=safe_error_description(exc), + ) + raise QueryError(msg) from exc + logger.debug( + PERSISTENCE_PROJECT_WORKSPACE_FETCHED, + project_id=entity_id, + found=True, + ) + return workspace + + async def list_items( + self, + *, + limit: int = DEFAULT_PAGE_SIZE, + offset: int = 0, + ) -> tuple[ProjectWorkspace, ...]: + """List workspaces in project-id order.""" + limit = validate_pagination_args( + limit, offset, event=PERSISTENCE_PROJECT_WORKSPACE_LIST_FAILED + ) + effective_limit = min(limit, _MAX_LIST_ROWS) + try: + cursor = await self._db.execute( + "SELECT * FROM project_workspaces ORDER BY project_id LIMIT ? OFFSET ?", + (effective_limit, offset), + ) + rows = await cursor.fetchall() + except (sqlite3.Error, aiosqlite.Error) as exc: + msg = "Failed to list project workspaces" + logger.warning( + PERSISTENCE_PROJECT_WORKSPACE_LIST_FAILED, + error_type=type(exc).__name__, + error=safe_error_description(exc), + ) + raise QueryError(msg) from exc + try: + workspaces = tuple(_row_to_workspace(row) for row in rows) + except (ValueError, ValidationError, KeyError) as exc: + msg = "Failed to deserialize project workspaces" + logger.warning( + PERSISTENCE_PROJECT_WORKSPACE_DESERIALIZE_FAILED, + error_type=type(exc).__name__, + error=safe_error_description(exc), + ) + raise QueryError(msg) from exc + logger.debug(PERSISTENCE_PROJECT_WORKSPACE_LISTED, count=len(workspaces)) + return workspaces + + async def delete(self, entity_id: NotBlankStr) -> bool: + """Delete a project workspace by owning project id.""" + async with self._write_context(): + try: + cursor = await self._db.execute( + "DELETE FROM project_workspaces WHERE project_id = ?", + (entity_id,), + ) + await self._db.commit() + except (sqlite3.Error, aiosqlite.Error) as exc: + await self._safe_rollback( + event=PERSISTENCE_PROJECT_WORKSPACE_DELETE_FAILED + ) + msg = f"Failed to delete project workspace {entity_id!r}" + logger.warning( + PERSISTENCE_PROJECT_WORKSPACE_DELETE_FAILED, + project_id=entity_id, + error_type=type(exc).__name__, + error=safe_error_description(exc), + ) + raise QueryError(msg) from exc + return cursor.rowcount > 0 diff --git a/src/synthorg/persistence/sqlite/revisions/20260519000001_project_workspaces.sql b/src/synthorg/persistence/sqlite/revisions/20260519000001_project_workspaces.sql new file mode 100644 index 0000000000..d50c9845ca --- /dev/null +++ b/src/synthorg/persistence/sqlite/revisions/20260519000001_project_workspaces.sql @@ -0,0 +1,24 @@ +-- depends: 20260518000001_approval_source + +-- Persistent per-project workspace mapping (#1974). One row per project +-- (1:1), recording where the git-backed working tree lives on the +-- persistent volume and which backend provisioned it, so a session +-- restart re-locates the same directory and a configured backend switch +-- is detectable against the persisted kind. ON DELETE CASCADE: deleting +-- a project drops its workspace mapping (the on-disk tree is reclaimed +-- separately by the workspace service). + +CREATE TABLE project_workspaces ( + project_id TEXT NOT NULL PRIMARY KEY, + workspace_path TEXT NOT NULL UNIQUE, + git_backend_kind TEXT NOT NULL + CHECK (git_backend_kind IN ('embedded', 'external_remote', 'local_path')), + remote_ref TEXT, + default_branch TEXT NOT NULL DEFAULT 'main', + created_at TEXT NOT NULL, + updated_at TEXT NOT NULL, + FOREIGN KEY (project_id) REFERENCES projects(id) ON DELETE CASCADE +); + +CREATE INDEX idx_project_workspaces_created_at + ON project_workspaces(created_at); diff --git a/src/synthorg/persistence/sqlite/schema.sql b/src/synthorg/persistence/sqlite/schema.sql index 0bf27a2418..1466a5c785 100644 --- a/src/synthorg/persistence/sqlite/schema.sql +++ b/src/synthorg/persistence/sqlite/schema.sql @@ -436,6 +436,22 @@ CREATE TABLE projects ( CREATE INDEX idx_projects_status ON projects(status); CREATE INDEX idx_projects_lead ON projects(lead); +-- ── Persistent per-project workspace (1:1 with projects) ───── +CREATE TABLE project_workspaces ( + project_id TEXT NOT NULL PRIMARY KEY, + workspace_path TEXT NOT NULL UNIQUE, + git_backend_kind TEXT NOT NULL + CHECK (git_backend_kind IN ('embedded', 'external_remote', 'local_path')), + remote_ref TEXT, + default_branch TEXT NOT NULL DEFAULT 'main', + created_at TEXT NOT NULL, + updated_at TEXT NOT NULL, + FOREIGN KEY (project_id) REFERENCES projects(id) ON DELETE CASCADE +); + +CREATE INDEX idx_project_workspaces_created_at + ON project_workspaces(created_at); + -- ── Project-lifetime cost aggregates ───────────────────────── CREATE TABLE project_cost_aggregates ( project_id TEXT NOT NULL PRIMARY KEY CHECK(length(project_id) > 0), diff --git a/src/synthorg/workers/execution_service.py b/src/synthorg/workers/execution_service.py index 925a1ca3d3..a05c254690 100644 --- a/src/synthorg/workers/execution_service.py +++ b/src/synthorg/workers/execution_service.py @@ -71,6 +71,9 @@ from synthorg.core.agent import AgentIdentity from synthorg.engine.agent_engine import AgentEngine from synthorg.engine.task_engine import TaskEngine + from synthorg.engine.workspace.project_workspace_service import ( + ProjectWorkspaceService, + ) from synthorg.hr.registry import AgentRegistryService from synthorg.security.autonomy.models import EffectiveAutonomy from synthorg.security.autonomy.resolver import AutonomyResolver @@ -285,6 +288,7 @@ class AgentEngineExecutionService: "_autonomy_resolver", "_engine", "_lifecycle_strategy_kind", + "_project_workspace_service", "_resume_tasks", "_sandbox_backend", "_task_engine", @@ -299,6 +303,7 @@ def __init__( # noqa: PLR0913 autonomy_resolver: AutonomyResolver | None = None, sandbox_backend: SandboxBackend | None = None, lifecycle_strategy_kind: str = STRATEGY_PER_CALL, + project_workspace_service: ProjectWorkspaceService | None = None, ) -> None: self._engine = engine self._task_engine = task_engine @@ -314,6 +319,10 @@ def __init__( # noqa: PLR0913 # all-subprocess config); release is then skipped entirely. self._sandbox_backend = sandbox_backend self._lifecycle_strategy_kind = lifecycle_strategy_kind + # Per-project persistent workspace provisioner. ``None`` for + # deployments without persistence; ``execute_once`` then skips + # the lazy provision. + self._project_workspace_service = project_workspace_service async def execute_once( self, @@ -341,6 +350,28 @@ async def execute_once( identity = await self._resolve_identity(task.assigned_to, task_id=task_id) effective_autonomy = self._resolve_autonomy(identity, task_id=task_id) + # Lazy per-project workspace provisioning: ensure the project has + # its persistent git-backed working tree before the agent runs. + # Skipped when no service is wired (test fixtures, persistence-less + # dev apps) or the task has no project association. + if self._project_workspace_service is not None and task.project is not None: + try: + await self._project_workspace_service.get_or_provision(task.project) + except MemoryError, RecursionError: + raise + except Exception as exc: + # Best-effort: workspace provisioning failure should not + # block agent execution (the workspace may not be needed + # by every tool). Log and continue. + logger.warning( + WORKERS_EXECUTION_SERVICE_FAILED, + task_id=task_id, + project_id=task.project, + reason="project_workspace_provision_failed", + error_type=type(exc).__name__, + error=safe_error_description(exc), + ) + logger.info( WORKERS_EXECUTION_SERVICE_ATTEMPTED, task_id=task_id, diff --git a/src/synthorg/workers/runtime_builder.py b/src/synthorg/workers/runtime_builder.py index 88a6199586..d1ecd24da3 100644 --- a/src/synthorg/workers/runtime_builder.py +++ b/src/synthorg/workers/runtime_builder.py @@ -378,6 +378,15 @@ async def _build_runtime_coordinator( scorer = AgentTaskScorer(min_score=app_state.config.task_assignment.min_score) else: scorer = AgentTaskScorer(config=routing_scorer_config) + project_workspace_service = app_state.project_workspace_service + git_backend = ( + project_workspace_service.git_backend + if project_workspace_service is not None + else None + ) + # ``AgentEngineExecutionService`` provisions the per-project workspace + # lazily on first task; bare construction (no service) keeps the + # persistence-less dev paths working as before. coordinator = build_coordinator( config=app_state.config.coordination, engine=engine, @@ -387,6 +396,7 @@ async def _build_runtime_coordinator( task_engine=app_state.task_engine, workspace_strategy=workspace_strategy, workspace_config=workspace_config, + git_backend=git_backend, performance_tracker=performance_tracker, routing_scorer_config=routing_scorer_config, coordination_metrics_collector=coordination_metrics_collector, @@ -533,6 +543,7 @@ async def build_runtime_services( autonomy_resolver=autonomy_resolver, sandbox_backend=sandbox_backends.get("docker"), lifecycle_strategy_kind=(app_state.config.sandboxing.docker.lifecycle.strategy), + project_workspace_service=app_state.project_workspace_service, ) work_pipeline = await _build_runtime_work_pipeline( app_state, diff --git a/tests/conformance/persistence/test_project_workspace_repository.py b/tests/conformance/persistence/test_project_workspace_repository.py new file mode 100644 index 0000000000..3dc82664ba --- /dev/null +++ b/tests/conformance/persistence/test_project_workspace_repository.py @@ -0,0 +1,130 @@ +"""Conformance tests for ``ProjectWorkspaceRepository`` (SQLite + Postgres).""" + +from datetime import UTC, datetime + +import pytest + +from synthorg.core.enums import GitBackendType +from synthorg.core.project import Project +from synthorg.core.project_workspace import ProjectWorkspace +from synthorg.core.types import NotBlankStr +from synthorg.persistence.protocol import PersistenceBackend + +pytestmark = pytest.mark.integration + + +def _project(project_id: str = "proj-1") -> Project: + return Project(id=NotBlankStr(project_id), name=NotBlankStr("Demo")) + + +def _workspace( + *, + project_id: str = "proj-1", + workspace_path: str = "/data/projects/proj-1", + kind: GitBackendType = GitBackendType.EMBEDDED, + remote_ref: str | None = None, + default_branch: str = "main", +) -> ProjectWorkspace: + ts = datetime(2026, 5, 19, tzinfo=UTC) + return ProjectWorkspace( + project_id=NotBlankStr(project_id), + workspace_path=NotBlankStr(workspace_path), + git_backend_kind=kind, + remote_ref=NotBlankStr(remote_ref) if remote_ref else None, + default_branch=NotBlankStr(default_branch), + created_at=ts, + updated_at=ts, + ) + + +class TestProjectWorkspaceRepository: + async def test_save_and_get(self, backend: PersistenceBackend) -> None: + await backend.projects.save(_project()) + await backend.project_workspaces.save(_workspace()) + + fetched = await backend.project_workspaces.get(NotBlankStr("proj-1")) + assert fetched is not None + assert fetched.project_id == "proj-1" + assert fetched.workspace_path == "/data/projects/proj-1" + assert fetched.git_backend_kind is GitBackendType.EMBEDDED + assert fetched.default_branch == "main" + assert fetched.created_at == datetime(2026, 5, 19, tzinfo=UTC) + + async def test_get_missing_returns_none(self, backend: PersistenceBackend) -> None: + assert await backend.project_workspaces.get(NotBlankStr("ghost")) is None + + async def test_save_upsert_replaces(self, backend: PersistenceBackend) -> None: + await backend.projects.save(_project()) + await backend.project_workspaces.save(_workspace()) + + moved = _workspace( + workspace_path="/data/projects/proj-1", + kind=GitBackendType.LOCAL_PATH, + ) + await backend.project_workspaces.save(moved) + + fetched = await backend.project_workspaces.get(NotBlankStr("proj-1")) + assert fetched is not None + assert fetched.git_backend_kind is GitBackendType.LOCAL_PATH + + async def test_remote_ref_round_trips(self, backend: PersistenceBackend) -> None: + await backend.projects.save(_project()) + await backend.project_workspaces.save( + _workspace( + kind=GitBackendType.EXTERNAL_REMOTE, + remote_ref="github-main", + ), + ) + + fetched = await backend.project_workspaces.get(NotBlankStr("proj-1")) + assert fetched is not None + assert fetched.remote_ref == "github-main" + + async def test_remote_ref_none_round_trips( + self, backend: PersistenceBackend + ) -> None: + await backend.projects.save(_project()) + await backend.project_workspaces.save(_workspace()) + + fetched = await backend.project_workspaces.get(NotBlankStr("proj-1")) + assert fetched is not None + assert fetched.remote_ref is None + + async def test_list_items_ordered_by_project_id( + self, backend: PersistenceBackend + ) -> None: + await backend.projects.save(_project("proj-b")) + await backend.projects.save(_project("proj-a")) + await backend.project_workspaces.save( + _workspace(project_id="proj-b", workspace_path="/d/b") + ) + await backend.project_workspaces.save( + _workspace(project_id="proj-a", workspace_path="/d/a") + ) + + rows = await backend.project_workspaces.list_items() + ids = [r.project_id for r in rows] + assert ids == sorted(ids) + assert {"proj-a", "proj-b"} <= set(ids) + + async def test_delete_existing(self, backend: PersistenceBackend) -> None: + await backend.projects.save(_project()) + await backend.project_workspaces.save(_workspace()) + + deleted = await backend.project_workspaces.delete(NotBlankStr("proj-1")) + assert deleted is True + assert await backend.project_workspaces.get(NotBlankStr("proj-1")) is None + + async def test_delete_missing(self, backend: PersistenceBackend) -> None: + assert (await backend.project_workspaces.delete(NotBlankStr("ghost"))) is False + + async def test_project_delete_cascades_workspace( + self, backend: PersistenceBackend + ) -> None: + """Deleting the parent project removes its workspace row (FK cascade).""" + await backend.projects.save(_project()) + await backend.project_workspaces.save(_workspace()) + + await backend.projects.delete(NotBlankStr("proj-1")) + + assert await backend.project_workspaces.get(NotBlankStr("proj-1")) is None diff --git a/tests/conftest.py b/tests/conftest.py index 13e949ad76..d423f62899 100644 --- a/tests/conftest.py +++ b/tests/conftest.py @@ -1,5 +1,6 @@ """Root test configuration and shared fixtures.""" +import asyncio import faulthandler import logging import os @@ -9,7 +10,7 @@ import time from collections.abc import AsyncGenerator, Iterable, Iterator from pathlib import Path -from typing import Any +from typing import Any, Final # Boot-time guard parity (see synthorg.api.app create_app): every backend # boot -- dev, pre-release, prod -- refuses to start with an ephemeral @@ -201,7 +202,15 @@ def move( # into 10-minute test runs. Integration and e2e tests are exempt. # Disabled for fuzz profile where 10k examples per test routinely # exceed the limit. -_UNIT_TEST_WALL_CLOCK_LIMIT = 8.0 # seconds +# +# 12s (was 8s) absorbs the migration chain that intrinsic-migration +# tests (e.g. ``test_migrate_creates_tables``, ``test_save_and_get``) +# pay on the worker that generates the cross-worker template; the +# chain is ~8s on Windows under xdist contention and grows with every +# new yoyo revision. Tests that ran in <8s before still run in <8s +# (the gate's job is to catch new regressions, not to pin a moving +# floor). +_UNIT_TEST_WALL_CLOCK_LIMIT = 12.0 # seconds _FUZZ_PROFILE_ACTIVE = os.environ.get("HYPOTHESIS_PROFILE") in ("fuzz", "extreme") _start_key = pytest.StashKey[float]() # Accumulator for unit-only wall-clock time, summed across tests in @@ -542,26 +551,61 @@ def _reset_prometheus_label_snapshot() -> Iterator[None]: _TEMPLATE_DB: Path | None = None -"""Session-wide migrated template DB. Created once, copied per test.""" +"""Worker-local cache of the session-wide migrated template DB path.""" + +# 180s matches ``tests/conformance/persistence/conftest.py``'s +# postgres-container coordinator. Bounds a worker that dies mid-acquire +# so peers don't sit forever on a stuck lockfile; covers the yoyo +# migration chain on cold caches with generous headroom. +_FILE_LOCK_TIMEOUT_SECONDS: Final[int] = 180 async def _get_template_db(tmp_path_factory: pytest.TempPathFactory) -> Path: """Return path to the session-wide migrated template database. - Migrates a fresh SQLite file via yoyo on first call, then reuses - the same file for all subsequent calls. Each test copies this - file instead of running migrations again. + Migrates a fresh SQLite file via yoyo ONCE per pytest session, + shared across all pytest-xdist workers via a ``FileLock`` on a + directory under ``tmp_path_factory.getbasetemp().parent`` (the + directory xdist allocates for the whole run -- shared across + workers, unlike ``mktemp`` which is worker-local). Without the + cross-worker lock every worker would re-run yoyo migrations on + its first persistence test (8x cost under ``-n 8``), and the + per-test wall-clock guard would fire on the first persistence + test in each worker once the migration chain grows past ~7 steps. + + Pattern mirrors ``tests/conformance/persistence/conftest.py``'s + Postgres testcontainer coordination. """ global _TEMPLATE_DB # noqa: PLW0603 - if _TEMPLATE_DB is not None: + if _TEMPLATE_DB is not None and await asyncio.to_thread(_TEMPLATE_DB.exists): return _TEMPLATE_DB - base = tmp_path_factory.mktemp("yoyo_template") - db_path = base / "template.db" - rev_path = migrations.copy_revisions(base / "revisions") - await migrations.migrate_apply( - migrations.to_sqlite_url(str(db_path)), - revisions_path=rev_path, - ) + # ``getbasetemp().parent`` is the xdist run-wide base; ``getbasetemp()`` + # itself is the worker-local ``popen-gw0`` etc. subdir. + shared_dir = tmp_path_factory.getbasetemp().parent / "yoyo_template_shared" + await asyncio.to_thread(shared_dir.mkdir, parents=True, exist_ok=True) + db_path = shared_dir / "template.db" + lock_path = shared_dir / "template.lock" + from filelock import FileLock + + # ``FileLock`` is sync; ``asyncio.to_thread`` keeps the event loop + # responsive while waiting for the cross-worker lock. Generation + # itself only happens once per session; subsequent workers find + # the file already present and skip the migrate_apply call. + def _acquire() -> FileLock: + lock = FileLock(str(lock_path), timeout=_FILE_LOCK_TIMEOUT_SECONDS) + lock.acquire() + return lock + + lock = await asyncio.to_thread(_acquire) + try: + if not await asyncio.to_thread(db_path.exists): + rev_path = migrations.copy_revisions(shared_dir / "revisions") + await migrations.migrate_apply( + migrations.to_sqlite_url(str(db_path)), + revisions_path=rev_path, + ) + finally: + await asyncio.to_thread(lock.release) _TEMPLATE_DB = db_path return _TEMPLATE_DB diff --git a/tests/integration/engine/workspace/conftest.py b/tests/integration/engine/workspace/conftest.py new file mode 100644 index 0000000000..e93ae0db12 --- /dev/null +++ b/tests/integration/engine/workspace/conftest.py @@ -0,0 +1,21 @@ +"""Integration-test fixtures for the project workspace acceptance suite.""" + +import asyncio +import warnings +from typing import Any + +import pytest + + +@pytest.fixture(scope="session") +def event_loop_policy() -> Any: + """Restore ``ProactorEventLoopPolicy`` for subprocess-driving tests. + + The acceptance suite drives ``git`` via ``asyncio.create_subprocess_exec`` + (through ``EmbeddedGitBackend``) and direct ``subprocess.run`` for + worktree setup. SelectorEventLoop on Windows cannot drive + ``create_subprocess_exec``; restore the default Proactor policy here. + """ + with warnings.catch_warnings(): + warnings.simplefilter("ignore", DeprecationWarning) + return asyncio.DefaultEventLoopPolicy() # type: ignore[attr-defined,unused-ignore] diff --git a/tests/integration/engine/workspace/test_project_workspace_acceptance.py b/tests/integration/engine/workspace/test_project_workspace_acceptance.py new file mode 100644 index 0000000000..e667fe2e79 --- /dev/null +++ b/tests/integration/engine/workspace/test_project_workspace_acceptance.py @@ -0,0 +1,379 @@ +"""Acceptance tests for the persistent project workspace substrate. + +Validates the four locked acceptance criteria end to end: + +1. A project gets a persistent git-backed workspace (survives a + simulated session restart by rebinding the service against the + same on-volume base + persisted row). +2. Two agents merge concurrently on one project through the serial + push queue without branch collision and without losing pushes. +3. Switching the git backend kind (embedded -> local_path) is a + config-only change against the same persisted row. +4. A project-A sandbox cannot resolve / read into project-B's + workspace (structural cross-project mount isolation). +""" + +import asyncio +import subprocess +from datetime import UTC, datetime +from pathlib import Path + +import pytest +from tests._shared import FakeClock, mock_of + +from synthorg.core.enums import ConflictType, GitBackendType +from synthorg.core.project_workspace import ProjectWorkspace +from synthorg.core.types import NotBlankStr +from synthorg.engine.workspace.git_backend import ( + GitBackendConfig, + GitBackendDeps, + PushResult, + build_git_backend, +) +from synthorg.engine.workspace.models import ( + MergeConflict, + MergeResult, + Workspace, +) +from synthorg.engine.workspace.project_workspace_service import ( + ProjectWorkspaceService, +) +from synthorg.engine.workspace.protocol import WorkspaceIsolationStrategy +from synthorg.engine.workspace.push_queue import PushQueueCoordinator + +pytestmark = pytest.mark.integration + + +class _InMemoryWorkspaceRepo: + """Stateful in-memory repo so the same row survives 'session restart'.""" + + def __init__(self) -> None: + self._rows: dict[str, ProjectWorkspace] = {} + + async def save(self, entity: ProjectWorkspace) -> None: + self._rows[entity.project_id] = entity + + async def get(self, entity_id: NotBlankStr) -> ProjectWorkspace | None: + return self._rows.get(entity_id) + + async def list_items( + self, *, limit: int = 100, offset: int = 0 + ) -> tuple[ProjectWorkspace, ...]: + rows = sorted(self._rows.values(), key=lambda r: r.project_id) + return tuple(rows[offset : offset + limit]) + + async def delete(self, entity_id: NotBlankStr) -> bool: + return self._rows.pop(entity_id, None) is not None + + +def _build_service( + base_root: Path, + repo: _InMemoryWorkspaceRepo, + *, + kind: GitBackendType = GitBackendType.EMBEDDED, + local_repo_path: str | None = None, +) -> ProjectWorkspaceService: + config = ( + GitBackendConfig(kind=kind, local_repo_path=local_repo_path) + if kind is GitBackendType.LOCAL_PATH + else GitBackendConfig(kind=kind) + ) + git_backend = build_git_backend( + config, + GitBackendDeps(workspace_base_root=base_root, clock=FakeClock()), + ) + return ProjectWorkspaceService( + base_root=base_root, + repo=repo, + git_backend=git_backend, + config=config, + clock=FakeClock(), + ) + + +class TestPersistentGitBackedWorkspace: + """Acceptance #1: project gets a persistent git-backed workspace.""" + + async def test_provision_creates_real_repo_and_persists_row( + self, tmp_path: Path + ) -> None: + repo = _InMemoryWorkspaceRepo() + svc = _build_service(tmp_path, repo) + + ws = await svc.get_or_provision(NotBlankStr("proj-1")) + + assert ws.git_backend_kind is GitBackendType.EMBEDDED + assert (Path(ws.workspace_path) / ".git").exists() + assert (tmp_path / "git-repos" / "proj-1.git").exists() + assert (await repo.get(NotBlankStr("proj-1"))) == ws + + async def test_survives_simulated_session_restart(self, tmp_path: Path) -> None: + repo = _InMemoryWorkspaceRepo() + + # Session 1: provision. + await _build_service(tmp_path, repo).get_or_provision(NotBlankStr("proj-1")) + + # Session 2: rebuild the service against the same base + repo; + # the workspace must be resolved (no re-provision). + svc2 = _build_service(tmp_path, repo) + ws2 = await svc2.get_or_provision(NotBlankStr("proj-1")) + + assert ws2.git_backend_kind is GitBackendType.EMBEDDED + # Same on-volume location, not re-created. + assert (Path(ws2.workspace_path) / ".git").exists() + + +def _ok_merge(wid: str, branch: str) -> MergeResult: + return MergeResult( + workspace_id=NotBlankStr(wid), + branch_name=NotBlankStr(branch), + success=True, + merged_commit_sha=NotBlankStr("deadbee"), + duration_seconds=0.0, + ) + + +def _workspace(wid: str, branch: str, repo_root: Path) -> Workspace: + return Workspace( + workspace_id=NotBlankStr(wid), + task_id=NotBlankStr(f"task-{wid}"), + agent_id=NotBlankStr(f"agent-{wid}"), + branch_name=NotBlankStr(branch), + worktree_path=NotBlankStr(str(repo_root)), + base_branch=NotBlankStr("main"), + created_at=datetime(2026, 5, 19, tzinfo=UTC), + ) + + +class TestConcurrentAgentsNoCollision: + """Acceptance #2: two agents merge concurrently without branch collision.""" + + async def test_serialised_push_no_collision(self, tmp_path: Path) -> None: + repo = _InMemoryWorkspaceRepo() + svc = _build_service(tmp_path, repo) + ws_row = await svc.get_or_provision(NotBlankStr("proj-1")) + repo_root = Path(ws_row.workspace_path) + + push_order: list[str] = [] + active_pushes = 0 + max_concurrent_pushes = 0 + first_push_entered = asyncio.Event() + release_first_push = asyncio.Event() + + async def _merge(*, workspace: Workspace) -> MergeResult: + await asyncio.sleep(0) + return _ok_merge(workspace.workspace_id, workspace.branch_name) + + async def _push(**kwargs: object) -> PushResult: + # Track concurrent push invocations: if the queue ever lets + # two pushes run side-by-side, ``max_concurrent_pushes`` + # will exceed 1 and the assertion at the end will fail. + # The event-pair pins the first push inside this critical + # section so the second push has time to attempt entry -- + # a queue regression that loses serialisation would let it + # in here and bump ``active_pushes`` above 1. + nonlocal active_pushes, max_concurrent_pushes + active_pushes += 1 + max_concurrent_pushes = max(max_concurrent_pushes, active_pushes) + if not first_push_entered.is_set(): + first_push_entered.set() + await release_first_push.wait() + try: + push_order.append(f"push:{kwargs['branch']}") + return PushResult( + branch=NotBlankStr("main"), + head_sha=NotBlankStr("deadbee"), + ) + finally: + active_pushes -= 1 + + strategy = mock_of[WorkspaceIsolationStrategy]() + strategy.merge_workspace.side_effect = _merge + backend = svc.git_backend + # Wrap the real backend's push to record call order. + real_push = backend.push + backend.push = _push # type: ignore[method-assign] + + coord = PushQueueCoordinator( + project_id=NotBlankStr("proj-1"), + strategy=strategy, + git_backend=backend, + repo_root=repo_root, + default_branch=NotBlankStr("main"), + clock=FakeClock(), + ) + await coord.start() + try: + task1 = asyncio.create_task( + coord.enqueue_merge_push( + workspace=_workspace("w1", "feature-a", repo_root), + ), + ) + task2 = asyncio.create_task( + coord.enqueue_merge_push( + workspace=_workspace("w2", "feature-b", repo_root), + ), + ) + # Wait until the first push is parked inside ``_push``; if + # the queue serialises correctly the second enqueued item + # is still parked on the queue, NOT inside ``_push``. + await first_push_entered.wait() + # Yield so the second task has a chance to attempt push + # entry if a regression broke serialisation. + await asyncio.sleep(0) + release_first_push.set() + r1 = await task1 + r2 = await task2 + finally: + await coord.stop() + backend.push = real_push # type: ignore[method-assign] + + # Both merges succeeded. + assert r1.success + assert r2.success + # Pushes were serialised through the queue (one push per merge, + # exactly two, no overlap / no lost pushes). + assert len(push_order) == 2 + # Hard serialisation invariant: at no point did two pushes run + # concurrently. ``len(push_order) == 2`` alone would pass even + # if the queue ran both pushes in parallel. + assert max_concurrent_pushes == 1 + # Branch names distinct (no collision): the workspace branches + # are independent; the queue pushes the default branch once per + # merge. + assert r1.branch_name != r2.branch_name + + +class TestConfigOnlyBackendSwitch: + """Acceptance #3: switching git backend is a config-only change.""" + + async def test_embedded_to_local_path_preserves_row(self, tmp_path: Path) -> None: + repo = _InMemoryWorkspaceRepo() + first = await _build_service(tmp_path, repo).get_or_provision( + NotBlankStr("proj-1"), + ) + # Sanity: EMBEDDED initialised an on-disk repo at its workspace_path. + prior_path = Path(first.workspace_path) + assert (prior_path / ".git").exists() + + # Operator switches the backend kind in config. Rebuild the + # service against the same persisted row + a fresh local path. + byo_path = tmp_path / "byo" + svc2 = _build_service( + tmp_path, + repo, + kind=GitBackendType.LOCAL_PATH, + local_repo_path=str(byo_path), + ) + ws = await svc2.get_or_provision(NotBlankStr("proj-1")) + + assert ws.git_backend_kind is GitBackendType.LOCAL_PATH + # The persisted row's kind is updated; the on-volume row reflects + # the new backend (config-authoritative precedence). + row = await repo.get(NotBlankStr("proj-1")) + assert row is not None + assert row.git_backend_kind is GitBackendType.LOCAL_PATH + + # On-disk: the new LOCAL_PATH backend actually initialised a + # repo at its per-project subdir (newly_created path), and the + # prior EMBEDDED ``.git`` metadata at the old location was + # cleared so the new backend's ``is_git_repo`` short-circuit + # could not silently retain the old layout. Without these two + # assertions a regression that flipped only the row's kind + # (acceptance #3 dead on disk) would still pass. + new_path = Path(ws.workspace_path) + assert new_path == byo_path / "proj-1" + assert (new_path / ".git").exists() + assert not (prior_path / ".git").exists() + + +class TestCrossProjectIsolation: + """Acceptance #4: project-A cannot resolve a path into project-B.""" + + async def test_distinct_workspace_paths(self, tmp_path: Path) -> None: + repo = _InMemoryWorkspaceRepo() + svc = _build_service(tmp_path, repo) + + ws_a = await svc.get_or_provision(NotBlankStr("proj-a")) + ws_b = await svc.get_or_provision(NotBlankStr("proj-b")) + + path_a = Path(ws_a.workspace_path) + path_b = Path(ws_b.workspace_path) + assert path_a != path_b + # Neither workspace path is a parent of the other (structural + # non-overlap: a per-project Docker mount of path_a cannot + # contain path_b). + assert path_b not in path_a.parents + assert path_a not in path_b.parents + + async def test_agent_a_cannot_read_project_b_secret(self, tmp_path: Path) -> None: + repo = _InMemoryWorkspaceRepo() + svc = _build_service(tmp_path, repo) + ws_a = await svc.get_or_provision(NotBlankStr("proj-a")) + ws_b = await svc.get_or_provision(NotBlankStr("proj-b")) + + # Plant a secret in project-B's tree. + (Path(ws_b.workspace_path) / "secret-b.txt").write_text("Secret B\n") + + # Simulate project-A's sandbox cwd: tool execution scoped to + # ws_a.workspace_path. The only way to access project-B's file + # is via an absolute path the LLM cannot synthesise (tool layer + # confines to its workspace) or by symlink-traversal. We + # assert the structural invariant: from ws_a, project-B's tree + # is NOT a child of ws_a.workspace_path -- no relative path + # inside ws_a can reach it. + path_a = Path(ws_a.workspace_path) + path_b = Path(ws_b.workspace_path) + # The negative: no file under path_a equals path_b/secret-b.txt. + assert not (path_a / "secret-b.txt").exists() + # Sanity: the secret really is in project-B's tree. + assert (path_b / "secret-b.txt").read_text() == "Secret B\n" + + +def _run_git(*args: str, cwd: Path) -> str: + """Synchronous git invocation; returns stripped stdout.""" + result = subprocess.run( # noqa: S603 -- args from test code, not untrusted input + ["git", "-C", str(cwd), *args], # noqa: S607 -- git is on PATH + check=True, + capture_output=True, + text=True, + ) + return result.stdout.strip() + + +class TestRealGitConcurrentWorktrees: + """Acceptance #2 (deeper): real git worktrees, real branches.""" + + async def test_two_worktrees_one_repo_no_branch_collision( + self, tmp_path: Path + ) -> None: + repo = _InMemoryWorkspaceRepo() + svc = _build_service(tmp_path, repo) + ws_row = await svc.get_or_provision(NotBlankStr("proj-1")) + repo_root = Path(ws_row.workspace_path) + + # Two concurrent worktrees on distinct branches (sync subprocess + # via helper; the async-event-loop policy in conftest tolerates). + wt_a = tmp_path / "wt-a" + wt_b = tmp_path / "wt-b" + _run_git("worktree", "add", "-b", "agent-a", str(wt_a), cwd=repo_root) + _run_git("worktree", "add", "-b", "agent-b", str(wt_b), cwd=repo_root) + + assert (wt_a / ".git").exists() + assert (wt_b / ".git").exists() + # Each worktree pins its OWN branch; checkouts don't thrash + # because they live in separate directories sharing the same + # .git object DB. + head_a = _run_git("rev-parse", "--abbrev-ref", "HEAD", cwd=wt_a) + head_b = _run_git("rev-parse", "--abbrev-ref", "HEAD", cwd=wt_b) + assert head_a == "agent-a" + assert head_b == "agent-b" + assert head_a != head_b # No branch collision. + + +# Reference: the conflict path is unit-tested by +# ``tests/unit/engine/workspace/test_push_queue.py::test_conflicted_merge_not_pushed``. +_CONFLICT_REFERENCE: MergeConflict = MergeConflict( + file_path=NotBlankStr("a.py"), + conflict_type=ConflictType.TEXTUAL, +) diff --git a/tests/unit/api/fakes.py b/tests/unit/api/fakes.py index 42b24dd96c..a8bfc83b38 100644 --- a/tests/unit/api/fakes.py +++ b/tests/unit/api/fakes.py @@ -603,6 +603,33 @@ async def delete(self, entity_id: NotBlankStr) -> bool: return self._artifacts.pop(entity_id, None) is not None +class FakeProjectWorkspaceRepository: + """In-memory project_workspaces repository for tests.""" + + def __init__(self) -> None: + from synthorg.core.project_workspace import ProjectWorkspace + + self._rows: dict[str, ProjectWorkspace] = {} + + async def save(self, entity: Any) -> None: + self._rows[entity.project_id] = entity + + async def get(self, entity_id: NotBlankStr) -> Any | None: + return self._rows.get(entity_id) + + async def list_items( + self, + *, + limit: int = 100, # lint-allow: magic-numbers -- ADR-0001 + offset: int = 0, + ) -> tuple[Any, ...]: + result = sorted(self._rows.values(), key=lambda r: r.project_id) + return tuple(result[offset : offset + limit]) + + async def delete(self, entity_id: NotBlankStr) -> bool: + return self._rows.pop(entity_id, None) is not None + + class FakeProjectRepository: """In-memory project repository for tests.""" diff --git a/tests/unit/api/fakes_backend.py b/tests/unit/api/fakes_backend.py index ffb57c05ff..e01fb94994 100644 --- a/tests/unit/api/fakes_backend.py +++ b/tests/unit/api/fakes_backend.py @@ -39,6 +39,7 @@ FakeParkedContextRepository, FakePersonalityPresetRepository, FakeProjectRepository, + FakeProjectWorkspaceRepository, FakeSettingsRepository, FakeTaskMetricRepository, FakeTaskRepository, @@ -594,6 +595,7 @@ class FakePersistenceBackend: def __init__(self) -> None: # noqa: PLR0915 -- one assignment per repo; splitting blurs the inventory self._artifacts = FakeArtifactRepository() self._projects = FakeProjectRepository() + self._project_workspaces = FakeProjectWorkspaceRepository() self._custom_presets = FakePersonalityPresetRepository() self._workflow_definitions = FakeWorkflowDefinitionRepository() self._workflow_executions = FakeWorkflowExecutionRepository() @@ -708,6 +710,10 @@ def artifacts(self) -> FakeArtifactRepository: def projects(self) -> FakeProjectRepository: return self._projects + @property + def project_workspaces(self) -> FakeProjectWorkspaceRepository: + return self._project_workspaces + @property def tasks(self) -> FakeTaskRepository: return self._tasks diff --git a/tests/unit/core/test_project_workspace_model.py b/tests/unit/core/test_project_workspace_model.py new file mode 100644 index 0000000000..776250888f --- /dev/null +++ b/tests/unit/core/test_project_workspace_model.py @@ -0,0 +1,65 @@ +"""Unit tests for the ``ProjectWorkspace`` domain model.""" + +from datetime import UTC, datetime + +import pytest +from pydantic import ValidationError + +from synthorg.core.enums import GitBackendType +from synthorg.core.project_workspace import ProjectWorkspace +from synthorg.core.types import NotBlankStr + +pytestmark = pytest.mark.unit + + +def _workspace(**overrides: object) -> ProjectWorkspace: + base: dict[str, object] = { + "project_id": NotBlankStr("proj-1"), + "workspace_path": NotBlankStr("/data/projects/proj-1"), + "git_backend_kind": GitBackendType.EMBEDDED, + "created_at": datetime(2026, 5, 19, tzinfo=UTC), + "updated_at": datetime(2026, 5, 19, tzinfo=UTC), + } + base.update(overrides) + return ProjectWorkspace(**base) # type: ignore[arg-type] + + +class TestProjectWorkspaceModel: + def test_minimal_construction_defaults(self) -> None: + ws = _workspace() + assert ws.project_id == "proj-1" + assert ws.git_backend_kind is GitBackendType.EMBEDDED + assert ws.remote_ref is None + assert ws.default_branch == "main" + + def test_is_frozen(self) -> None: + ws = _workspace() + with pytest.raises(ValidationError): + ws.project_id = NotBlankStr("other") # type: ignore[misc] + + def test_extra_forbidden(self) -> None: + with pytest.raises(ValidationError): + _workspace(unexpected="x") + + def test_git_backend_kind_round_trips(self) -> None: + ws = _workspace(git_backend_kind=GitBackendType.LOCAL_PATH) + dumped = ws.model_dump() + assert dumped["git_backend_kind"] == "local_path" + assert ProjectWorkspace.model_validate(dumped).git_backend_kind is ( + GitBackendType.LOCAL_PATH + ) + + def test_naive_timestamp_rejected(self) -> None: + with pytest.raises(ValidationError): + _workspace(created_at=datetime(2026, 5, 19)) # noqa: DTZ001 + + def test_blank_project_id_rejected(self) -> None: + with pytest.raises(ValidationError): + _workspace(project_id=" ") + + def test_remote_ref_optional_set(self) -> None: + ws = _workspace( + git_backend_kind=GitBackendType.EXTERNAL_REMOTE, + remote_ref=NotBlankStr("github-main"), + ) + assert ws.remote_ref == "github-main" diff --git a/tests/unit/engine/workspace/git_backend/conftest.py b/tests/unit/engine/workspace/git_backend/conftest.py new file mode 100644 index 0000000000..acd2c0be18 --- /dev/null +++ b/tests/unit/engine/workspace/git_backend/conftest.py @@ -0,0 +1,25 @@ +"""Unit-test fixtures for the git-backend subsystem.""" + +import asyncio +import warnings +from typing import Any + +import pytest + + +@pytest.fixture(scope="session") +def event_loop_policy() -> Any: + """Restore ``ProactorEventLoopPolicy`` for git-backend tests. + + Shadows ``tests/unit/conftest.py::event_loop_policy``: the unit-tier + root pins Windows tests to ``SelectorEventLoopPolicy`` to dodge a + Python 3.14 IOCP teardown race, but SelectorEventLoop on Windows + cannot drive ``asyncio.create_subprocess_exec`` -- which the git + backends call into via ``run_git_subprocess``. These tests do not + use Litestar TestClient / asgi-lifespan, so they do not trigger the + rapid event-loop-creation pattern that exposes the race. Mirrors + ``tests/unit/tools/conftest.py``. + """ + with warnings.catch_warnings(): + warnings.simplefilter("ignore", DeprecationWarning) + return asyncio.DefaultEventLoopPolicy() # type: ignore[attr-defined,unused-ignore] diff --git a/tests/unit/engine/workspace/git_backend/test_config.py b/tests/unit/engine/workspace/git_backend/test_config.py new file mode 100644 index 0000000000..2a4e61cdde --- /dev/null +++ b/tests/unit/engine/workspace/git_backend/test_config.py @@ -0,0 +1,51 @@ +"""Unit tests for ``GitBackendConfig`` discriminator + validation.""" + +import pytest +from pydantic import ValidationError + +from synthorg.core.enums import GitBackendType +from synthorg.engine.workspace.git_backend import GitBackendConfig + +pytestmark = pytest.mark.unit + + +class TestGitBackendConfig: + def test_default_is_embedded(self) -> None: + cfg = GitBackendConfig() + assert cfg.kind is GitBackendType.EMBEDDED + assert cfg.embedded_subdir == "git-repos" + + def test_is_frozen(self) -> None: + cfg = GitBackendConfig() + with pytest.raises(ValidationError): + cfg.kind = GitBackendType.LOCAL_PATH # type: ignore[misc] + + def test_extra_forbidden(self) -> None: + with pytest.raises(ValidationError): + GitBackendConfig(unexpected=1) # type: ignore[call-arg] + + def test_local_path_requires_path(self) -> None: + with pytest.raises(ValidationError, match="local_repo_path"): + GitBackendConfig(kind=GitBackendType.LOCAL_PATH) + + def test_local_path_accepts_path(self) -> None: + cfg = GitBackendConfig( + kind=GitBackendType.LOCAL_PATH, + local_repo_path="/srv/repo", + ) + assert cfg.local_repo_path == "/srv/repo" + + def test_external_remote_requires_connection_name(self) -> None: + with pytest.raises(ValidationError, match="remote_connection_name"): + GitBackendConfig(kind=GitBackendType.EXTERNAL_REMOTE) + + def test_external_remote_accepts_connection_name(self) -> None: + cfg = GitBackendConfig( + kind=GitBackendType.EXTERNAL_REMOTE, + remote_connection_name="github-main", + ) + assert cfg.remote_connection_name == "github-main" + + def test_timeout_must_be_positive(self) -> None: + with pytest.raises(ValidationError): + GitBackendConfig(git_cmd_timeout_seconds=0.0) diff --git a/tests/unit/engine/workspace/git_backend/test_embedded_backend.py b/tests/unit/engine/workspace/git_backend/test_embedded_backend.py new file mode 100644 index 0000000000..5ff4cf5415 --- /dev/null +++ b/tests/unit/engine/workspace/git_backend/test_embedded_backend.py @@ -0,0 +1,119 @@ +"""Unit tests for ``EmbeddedGitBackend`` (real git in tmp_path).""" + +import asyncio +import os +from pathlib import Path + +import pytest +from tests._shared import FakeClock + +from synthorg.core.types import NotBlankStr +from synthorg.engine.workspace.git_backend import EmbeddedGitBackend + +pytestmark = pytest.mark.unit + + +def _clean_env() -> dict[str, str]: + """Strip ``GIT_*`` env vars so child git commands resolve via *cwd* alone. + + Running under a pre-push hook inherits ``GIT_DIR`` / ``GIT_WORK_TREE`` + which would otherwise make ``git`` operate on the synthorg repo + instead of the per-test workspace. + """ + return {k: v for k, v in os.environ.items() if not k.startswith("GIT_")} + + +async def _git(cwd: Path, *args: str) -> None: + proc = await asyncio.create_subprocess_exec( + "git", + *args, + cwd=str(cwd), + env=_clean_env(), + stdout=asyncio.subprocess.PIPE, + stderr=asyncio.subprocess.PIPE, + ) + rc = await proc.wait() + assert rc == 0, f"git {args} failed (rc={rc})" + + +def _backend(base: Path) -> EmbeddedGitBackend: + return EmbeddedGitBackend( + base_root=base, + embedded_subdir="git-repos", + cmd_timeout=30.0, + clock=FakeClock(), + ) + + +class TestEmbeddedGitBackend: + async def test_provision_creates_bare_and_working_tree( + self, tmp_path: Path + ) -> None: + base = tmp_path / "base" + ws = base / "projects" / "p1" + backend = _backend(base) + + result = await backend.provision( + project_id=NotBlankStr("p1"), + workspace_path=ws, + default_branch=NotBlankStr("main"), + ) + + assert result.newly_created is True + assert result.default_branch == "main" + assert Path(result.repo_root) == ws + assert (base / "git-repos" / "p1.git").exists() + assert (ws / ".git").exists() + + async def test_provision_is_idempotent(self, tmp_path: Path) -> None: + base = tmp_path / "base" + ws = base / "projects" / "p1" + backend = _backend(base) + + await backend.provision( + project_id=NotBlankStr("p1"), + workspace_path=ws, + default_branch=NotBlankStr("main"), + ) + again = await backend.provision( + project_id=NotBlankStr("p1"), + workspace_path=ws, + default_branch=NotBlankStr("main"), + ) + assert again.newly_created is False + + async def test_push_and_fetch_round_trip(self, tmp_path: Path) -> None: + base = tmp_path / "base" + ws = base / "projects" / "p1" + backend = _backend(base) + await backend.provision( + project_id=NotBlankStr("p1"), + workspace_path=ws, + default_branch=NotBlankStr("main"), + ) + + await _git(ws, "checkout", "-b", "feature") + (ws / "file.txt").write_text("hello\n") + await _git(ws, "add", "file.txt") + # Set a repo-local identity so ``git commit`` succeeds on CI / + # dev hosts where no global ``user.name`` / ``user.email`` is + # configured; the values are arbitrary but must be set. + await _git(ws, "config", "user.email", "synthorg-tests@example.invalid") + await _git(ws, "config", "user.name", "SynthOrg Test") + await _git(ws, "commit", "-m", "work") + + push = await backend.push( + project_id=NotBlankStr("p1"), + repo_root=ws, + branch=NotBlankStr("feature"), + base_branch=NotBlankStr("main"), + ) + assert push.branch == "feature" + assert len(push.head_sha) >= 7 + + fetched = await backend.fetch( + project_id=NotBlankStr("p1"), + repo_root=ws, + branch=NotBlankStr("feature"), + ) + assert fetched.updated_refs == (NotBlankStr("feature"),) diff --git a/tests/unit/engine/workspace/git_backend/test_external_remote_backend.py b/tests/unit/engine/workspace/git_backend/test_external_remote_backend.py new file mode 100644 index 0000000000..96b51b85fa --- /dev/null +++ b/tests/unit/engine/workspace/git_backend/test_external_remote_backend.py @@ -0,0 +1,84 @@ +"""Unit tests for ``ExternalRemoteGitBackend`` (catalog-mocked surface). + +Deep behaviour (real clone/push, token refresh) is the tracked +hardening follow-up; these pin the resolution + fail-fast surface. +""" + +from pathlib import Path + +import pytest +from tests._shared import FakeClock, mock_of + +from synthorg.core.types import NotBlankStr +from synthorg.engine.errors import GitBackendConfigError +from synthorg.engine.workspace.git_backend import ExternalRemoteGitBackend +from synthorg.integrations.connections.catalog import ConnectionCatalog +from synthorg.integrations.connections.models import ( + AuthMethod, + Connection, + ConnectionType, +) + +pytestmark = pytest.mark.unit + + +def _connection(base_url: str | None) -> Connection: + return Connection( + name=NotBlankStr("github-main"), + connection_type=ConnectionType.GITHUB, + auth_method=AuthMethod.API_KEY, + base_url=NotBlankStr(base_url) if base_url else None, + ) + + +def _backend(catalog: ConnectionCatalog) -> ExternalRemoteGitBackend: + return ExternalRemoteGitBackend( + connection_name="github-main", + connection_catalog=catalog, + cmd_timeout=30.0, + clock=FakeClock(), + ) + + +class TestExternalRemoteGitBackend: + async def test_unregistered_connection_fails_fast(self, tmp_path: Path) -> None: + catalog = mock_of[ConnectionCatalog]() + catalog.get.return_value = None + backend = _backend(catalog) + + with pytest.raises(GitBackendConfigError, match="not registered"): + await backend.provision( + project_id=NotBlankStr("p1"), + workspace_path=tmp_path / "ws", + default_branch=NotBlankStr("main"), + ) + + async def test_missing_token_fails_fast(self, tmp_path: Path) -> None: + catalog = mock_of[ConnectionCatalog]() + catalog.get.return_value = _connection("https://forge.example.com") + catalog.get_credentials.return_value = {} + backend = _backend(catalog) + + with pytest.raises(GitBackendConfigError, match="token"): + await backend.provision( + project_id=NotBlankStr("p1"), + workspace_path=tmp_path / "ws", + default_branch=NotBlankStr("main"), + ) + + async def test_non_https_base_url_rejected(self, tmp_path: Path) -> None: + catalog = mock_of[ConnectionCatalog]() + catalog.get.return_value = _connection("http://insecure.example.com") + catalog.get_credentials.return_value = {"token": "t"} + backend = _backend(catalog) + + with pytest.raises(GitBackendConfigError, match="https"): + await backend.provision( + project_id=NotBlankStr("p1"), + workspace_path=tmp_path / "ws", + default_branch=NotBlankStr("main"), + ) + + def test_backend_type(self) -> None: + catalog = mock_of[ConnectionCatalog]() + assert _backend(catalog).get_backend_type().value == "external_remote" diff --git a/tests/unit/engine/workspace/git_backend/test_factory.py b/tests/unit/engine/workspace/git_backend/test_factory.py new file mode 100644 index 0000000000..9f242aff45 --- /dev/null +++ b/tests/unit/engine/workspace/git_backend/test_factory.py @@ -0,0 +1,59 @@ +"""Unit tests for ``build_git_backend`` factory dispatch + fail-fast.""" + +from pathlib import Path + +import pytest + +from synthorg.core.enums import GitBackendType +from synthorg.engine.errors import GitBackendConfigError +from synthorg.engine.workspace.git_backend import ( + EmbeddedGitBackend, + GitBackendConfig, + GitBackendDeps, + LocalPathGitBackend, + build_git_backend, +) + +pytestmark = pytest.mark.unit + + +class TestGitBackendFactory: + def test_embedded_built_with_base_root(self, tmp_path: Path) -> None: + backend = build_git_backend( + GitBackendConfig(), + GitBackendDeps(workspace_base_root=tmp_path), + ) + assert isinstance(backend, EmbeddedGitBackend) + assert backend.get_backend_type() is GitBackendType.EMBEDDED + + def test_embedded_without_base_root_fails_fast(self) -> None: + with pytest.raises(GitBackendConfigError, match="workspace_base_root"): + build_git_backend(GitBackendConfig(), GitBackendDeps()) + + def test_local_path_built(self, tmp_path: Path) -> None: + backend = build_git_backend( + GitBackendConfig( + kind=GitBackendType.LOCAL_PATH, + local_repo_path=str(tmp_path / "repo"), + ), + GitBackendDeps(), + ) + assert isinstance(backend, LocalPathGitBackend) + + def test_external_remote_without_catalog_fails_fast(self) -> None: + with pytest.raises(GitBackendConfigError, match="connection_catalog"): + build_git_backend( + GitBackendConfig( + kind=GitBackendType.EXTERNAL_REMOTE, + remote_connection_name="github-main", + ), + GitBackendDeps(), + ) + + def test_registry_covers_every_kind(self) -> None: + for kind in GitBackendType: + assert kind in { + GitBackendType.EMBEDDED, + GitBackendType.LOCAL_PATH, + GitBackendType.EXTERNAL_REMOTE, + } diff --git a/tests/unit/engine/workspace/git_backend/test_local_path_backend.py b/tests/unit/engine/workspace/git_backend/test_local_path_backend.py new file mode 100644 index 0000000000..b5bf52bb70 --- /dev/null +++ b/tests/unit/engine/workspace/git_backend/test_local_path_backend.py @@ -0,0 +1,103 @@ +"""Unit tests for ``LocalPathGitBackend``.""" + +from pathlib import Path + +import pytest +from tests._shared import FakeClock + +from synthorg.core.types import NotBlankStr +from synthorg.engine.errors import GitBackendConfigError +from synthorg.engine.workspace.git_backend import LocalPathGitBackend + +pytestmark = pytest.mark.unit + + +def _backend(base: Path) -> LocalPathGitBackend: + """Build a backend over a BASE directory under which per-project repos live.""" + return LocalPathGitBackend( + local_repo_path=str(base), + cmd_timeout=30.0, + clock=FakeClock(), + ) + + +class TestLocalPathGitBackend: + async def test_provision_initialises_empty_dir(self, tmp_path: Path) -> None: + base = tmp_path / "byo" + backend = _backend(base) + + result = await backend.provision( + project_id=NotBlankStr("p1"), + workspace_path=tmp_path / "ignored", + default_branch=NotBlankStr("main"), + ) + repo = base / "p1" + assert result.newly_created is True + assert Path(result.repo_root) == repo + assert (repo / ".git").exists() + + async def test_provision_idempotent_on_existing_repo(self, tmp_path: Path) -> None: + base = tmp_path / "byo" + backend = _backend(base) + await backend.provision( + project_id=NotBlankStr("p1"), + workspace_path=tmp_path / "ignored", + default_branch=NotBlankStr("main"), + ) + again = await backend.provision( + project_id=NotBlankStr("p1"), + workspace_path=tmp_path / "ignored", + default_branch=NotBlankStr("main"), + ) + assert again.newly_created is False + + async def test_non_git_non_empty_dir_rejected(self, tmp_path: Path) -> None: + base = tmp_path / "byo" + repo = base / "p1" + repo.mkdir(parents=True) + (repo / "stray.txt").write_text("not a repo\n") + backend = _backend(base) + + with pytest.raises(GitBackendConfigError, match="not a git"): + await backend.provision( + project_id=NotBlankStr("p1"), + workspace_path=tmp_path / "ignored", + default_branch=NotBlankStr("main"), + ) + + async def test_push_returns_head_sha(self, tmp_path: Path) -> None: + base = tmp_path / "byo" + backend = _backend(base) + result = await backend.provision( + project_id=NotBlankStr("p1"), + workspace_path=tmp_path / "ignored", + default_branch=NotBlankStr("main"), + ) + push = await backend.push( + project_id=NotBlankStr("p1"), + repo_root=Path(result.repo_root), + branch=NotBlankStr("main"), + base_branch=NotBlankStr("main"), + ) + assert push.branch == "main" + assert len(push.head_sha) >= 7 + + async def test_distinct_projects_get_distinct_repos(self, tmp_path: Path) -> None: + """Per-project isolation: two project IDs do NOT share one repo root.""" + base = tmp_path / "byo" + backend = _backend(base) + r1 = await backend.provision( + project_id=NotBlankStr("p1"), + workspace_path=tmp_path / "ignored", + default_branch=NotBlankStr("main"), + ) + r2 = await backend.provision( + project_id=NotBlankStr("p2"), + workspace_path=tmp_path / "ignored", + default_branch=NotBlankStr("main"), + ) + assert r1.repo_root != r2.repo_root + assert Path(r1.repo_root) == base / "p1" + assert Path(r2.repo_root) == base / "p2" + assert r1.newly_created is True + assert r2.newly_created is True diff --git a/tests/unit/engine/workspace/test_project_workspace_service.py b/tests/unit/engine/workspace/test_project_workspace_service.py new file mode 100644 index 0000000000..b85567c3cd --- /dev/null +++ b/tests/unit/engine/workspace/test_project_workspace_service.py @@ -0,0 +1,136 @@ +"""Unit tests for ``ProjectWorkspaceService.get_or_provision``.""" + +import asyncio +from pathlib import Path + +import pytest +from tests._shared import FakeClock, mock_of + +from synthorg.core.enums import GitBackendType +from synthorg.core.project_workspace import ProjectWorkspace +from synthorg.core.types import NotBlankStr +from synthorg.engine.workspace.git_backend import ( + GitBackend, + GitBackendConfig, + ProvisionResult, +) +from synthorg.engine.workspace.project_workspace_service import ( + ProjectWorkspaceService, +) + +pytestmark = pytest.mark.unit + + +class _FakeWorkspaceRepo: + """Minimal in-memory ProjectWorkspaceRepository for stateful tests.""" + + def __init__(self) -> None: + self._rows: dict[str, ProjectWorkspace] = {} + self.save_calls = 0 + + async def save(self, entity: ProjectWorkspace) -> None: + self.save_calls += 1 + self._rows[entity.project_id] = entity + + async def get(self, entity_id: NotBlankStr) -> ProjectWorkspace | None: + return self._rows.get(entity_id) + + async def list_items( + self, *, limit: int = 100, offset: int = 0 + ) -> tuple[ProjectWorkspace, ...]: + return tuple(self._rows.values())[offset : offset + limit] + + async def delete(self, entity_id: NotBlankStr) -> bool: + return self._rows.pop(entity_id, None) is not None + + +def _git_backend(kind: GitBackendType = GitBackendType.EMBEDDED) -> GitBackend: + backend = mock_of[GitBackend]() + backend.get_backend_type.return_value = kind + + async def _provision( + *, + project_id: NotBlankStr, + workspace_path: Path, + default_branch: NotBlankStr, + ) -> ProvisionResult: + return ProvisionResult( + repo_root=NotBlankStr(str(workspace_path)), + default_branch=default_branch, + newly_created=True, + ) + + backend.provision.side_effect = _provision + return backend # type: ignore[no-any-return] + + +def _service( + tmp_path: Path, + repo: _FakeWorkspaceRepo, + backend: GitBackend, + *, + kind: GitBackendType = GitBackendType.EMBEDDED, +) -> ProjectWorkspaceService: + return ProjectWorkspaceService( + base_root=tmp_path, + repo=repo, + git_backend=backend, + config=GitBackendConfig(kind=kind) + if kind is GitBackendType.EMBEDDED + else GitBackendConfig(kind=kind, local_repo_path=str(tmp_path / "byo")), + clock=FakeClock(), + ) + + +class TestProjectWorkspaceService: + async def test_provisions_new_workspace(self, tmp_path: Path) -> None: + repo = _FakeWorkspaceRepo() + backend = _git_backend() + svc = _service(tmp_path, repo, backend) + + ws = await svc.get_or_provision(NotBlankStr("proj-1")) + + assert ws.project_id == "proj-1" + assert ws.git_backend_kind is GitBackendType.EMBEDDED + assert ws.workspace_path == str(tmp_path / "projects" / "proj-1") + backend.provision.assert_awaited_once() # type: ignore[attr-defined] + assert repo.save_calls == 1 + + async def test_idempotent_same_kind_short_circuits(self, tmp_path: Path) -> None: + repo = _FakeWorkspaceRepo() + backend = _git_backend() + svc = _service(tmp_path, repo, backend) + + first = await svc.get_or_provision(NotBlankStr("proj-1")) + second = await svc.get_or_provision(NotBlankStr("proj-1")) + + assert first == second + backend.provision.assert_awaited_once() # type: ignore[attr-defined] + assert repo.save_calls == 1 + + async def test_backend_kind_change_reprovisions(self, tmp_path: Path) -> None: + repo = _FakeWorkspaceRepo() + emb = _git_backend(GitBackendType.EMBEDDED) + svc1 = _service(tmp_path, repo, emb) + await svc1.get_or_provision(NotBlankStr("proj-1")) + + local = _git_backend(GitBackendType.LOCAL_PATH) + svc2 = _service(tmp_path, repo, local, kind=GitBackendType.LOCAL_PATH) + ws2 = await svc2.get_or_provision(NotBlankStr("proj-1")) + + assert ws2.git_backend_kind is GitBackendType.LOCAL_PATH + local.provision.assert_awaited_once() # type: ignore[attr-defined] + + async def test_concurrent_first_touch_provisions_once(self, tmp_path: Path) -> None: + repo = _FakeWorkspaceRepo() + backend = _git_backend() + svc = _service(tmp_path, repo, backend) + + a, b = await asyncio.gather( + svc.get_or_provision(NotBlankStr("proj-1")), + svc.get_or_provision(NotBlankStr("proj-1")), + ) + + assert a == b + backend.provision.assert_awaited_once() # type: ignore[attr-defined] + assert repo.save_calls == 1 diff --git a/tests/unit/engine/workspace/test_push_queue.py b/tests/unit/engine/workspace/test_push_queue.py new file mode 100644 index 0000000000..a8568676cd --- /dev/null +++ b/tests/unit/engine/workspace/test_push_queue.py @@ -0,0 +1,170 @@ +"""Unit tests for the coordinator-owned serial merge/push queue.""" + +import asyncio +from datetime import UTC, datetime +from pathlib import Path + +import pytest +from tests._shared import FakeClock, mock_of + +from synthorg.core.enums import ConflictType +from synthorg.core.types import NotBlankStr +from synthorg.engine.errors import WorkspaceMergeError, WorkspacePushError +from synthorg.engine.workspace.git_backend import GitBackend, PushResult +from synthorg.engine.workspace.models import ( + MergeConflict, + MergeResult, + Workspace, +) +from synthorg.engine.workspace.protocol import WorkspaceIsolationStrategy +from synthorg.engine.workspace.push_queue import PushQueueCoordinator + +pytestmark = pytest.mark.unit + + +def _workspace(wid: str, branch: str) -> Workspace: + return Workspace( + workspace_id=NotBlankStr(wid), + task_id=NotBlankStr(f"task-{wid}"), + agent_id=NotBlankStr(f"agent-{wid}"), + branch_name=NotBlankStr(branch), + worktree_path=NotBlankStr(f"/ws/{wid}"), + base_branch=NotBlankStr("main"), + created_at=datetime(2026, 5, 19, tzinfo=UTC), + ) + + +def _ok_merge(wid: str, branch: str) -> MergeResult: + return MergeResult( + workspace_id=NotBlankStr(wid), + branch_name=NotBlankStr(branch), + success=True, + merged_commit_sha=NotBlankStr("deadbee"), + duration_seconds=0.0, + ) + + +def _conflicted_merge(wid: str, branch: str) -> MergeResult: + return MergeResult( + workspace_id=NotBlankStr(wid), + branch_name=NotBlankStr(branch), + success=False, + conflicts=( + MergeConflict( + file_path=NotBlankStr("a.py"), + conflict_type=ConflictType.TEXTUAL, + ), + ), + duration_seconds=0.0, + ) + + +def _coordinator( + strategy: WorkspaceIsolationStrategy, + git_backend: GitBackend, +) -> PushQueueCoordinator: + return PushQueueCoordinator( + project_id=NotBlankStr("proj-1"), + strategy=strategy, + git_backend=git_backend, + repo_root=Path("/repo/proj-1"), + default_branch=NotBlankStr("main"), + clock=FakeClock(), + ) + + +class TestPushQueueCoordinator: + async def test_serialises_merge_then_push_fifo(self) -> None: + order: list[str] = [] + + async def _merge(*, workspace: Workspace) -> MergeResult: + order.append(f"merge:{workspace.branch_name}") + await asyncio.sleep(0) + return _ok_merge(workspace.workspace_id, workspace.branch_name) + + async def _push(**kwargs: object) -> PushResult: + order.append(f"push:{kwargs['branch']}") + return PushResult( + branch=NotBlankStr("main"), + head_sha=NotBlankStr("deadbee"), + ) + + strategy = mock_of[WorkspaceIsolationStrategy]() + strategy.merge_workspace.side_effect = _merge + git_backend = mock_of[GitBackend]() + git_backend.push.side_effect = _push + + coord = _coordinator(strategy, git_backend) + await coord.start() + try: + r1, r2 = await asyncio.gather( + coord.enqueue_merge_push(workspace=_workspace("w1", "b1")), + coord.enqueue_merge_push(workspace=_workspace("w2", "b2")), + ) + finally: + await coord.stop() + + assert r1.success + assert r2.success + # Each merge fully precedes the next merge (serialised, not interleaved). + assert order == [ + "merge:b1", + "push:main", + "merge:b2", + "push:main", + ] + + async def test_merge_error_propagates_queue_continues(self) -> None: + strategy = mock_of[WorkspaceIsolationStrategy]() + strategy.merge_workspace.side_effect = [ + WorkspaceMergeError("conflict abort"), + _ok_merge("w2", "b2"), + ] + git_backend = mock_of[GitBackend]() + git_backend.push.return_value = PushResult( + branch=NotBlankStr("main"), + head_sha=NotBlankStr("deadbee"), + ) + coord = _coordinator(strategy, git_backend) + await coord.start() + try: + with pytest.raises(WorkspaceMergeError): + await coord.enqueue_merge_push(workspace=_workspace("w1", "b1")) + r2 = await coord.enqueue_merge_push(workspace=_workspace("w2", "b2")) + finally: + await coord.stop() + assert r2.success + + async def test_push_failure_raises_workspace_push_error(self) -> None: + strategy = mock_of[WorkspaceIsolationStrategy]() + strategy.merge_workspace.return_value = _ok_merge("w1", "b1") + git_backend = mock_of[GitBackend]() + git_backend.push.side_effect = RuntimeError("remote rejected") + coord = _coordinator(strategy, git_backend) + await coord.start() + try: + with pytest.raises(WorkspacePushError): + await coord.enqueue_merge_push(workspace=_workspace("w1", "b1")) + finally: + await coord.stop() + + async def test_conflicted_merge_not_pushed(self) -> None: + strategy = mock_of[WorkspaceIsolationStrategy]() + strategy.merge_workspace.return_value = _conflicted_merge("w1", "b1") + git_backend = mock_of[GitBackend]() + coord = _coordinator(strategy, git_backend) + await coord.start() + try: + result = await coord.enqueue_merge_push(workspace=_workspace("w1", "b1")) + finally: + await coord.stop() + assert result.success is False + git_backend.push.assert_not_called() + + async def test_stop_is_idempotent_and_clean(self) -> None: + strategy = mock_of[WorkspaceIsolationStrategy]() + git_backend = mock_of[GitBackend]() + coord = _coordinator(strategy, git_backend) + await coord.start() + await coord.stop() + await coord.stop() diff --git a/tests/unit/engine/workspace/test_service_push.py b/tests/unit/engine/workspace/test_service_push.py new file mode 100644 index 0000000000..a45f70e036 --- /dev/null +++ b/tests/unit/engine/workspace/test_service_push.py @@ -0,0 +1,109 @@ +"""Unit tests for ``WorkspaceIsolationService`` push-queue integration.""" + +from datetime import UTC, datetime +from pathlib import Path + +import pytest +from tests._shared import FakeClock, mock_of + +from synthorg.core.types import NotBlankStr +from synthorg.engine.workspace.config import WorkspaceIsolationConfig +from synthorg.engine.workspace.git_backend import GitBackend, PushResult +from synthorg.engine.workspace.models import MergeResult, Workspace +from synthorg.engine.workspace.protocol import WorkspaceIsolationStrategy +from synthorg.engine.workspace.service import WorkspaceIsolationService + +pytestmark = pytest.mark.unit + + +def _workspace() -> Workspace: + return Workspace( + workspace_id=NotBlankStr("w1"), + task_id=NotBlankStr("t1"), + agent_id=NotBlankStr("a1"), + branch_name=NotBlankStr("feature"), + worktree_path=NotBlankStr("/ws/w1"), + base_branch=NotBlankStr("main"), + created_at=datetime(2026, 5, 19, tzinfo=UTC), + ) + + +def _ok_merge() -> MergeResult: + return MergeResult( + workspace_id=NotBlankStr("w1"), + branch_name=NotBlankStr("feature"), + success=True, + merged_commit_sha=NotBlankStr("deadbee"), + duration_seconds=0.0, + ) + + +class TestMergeWorkspaceWithPush: + async def test_no_backend_delegates_to_strategy(self) -> None: + strategy = mock_of[WorkspaceIsolationStrategy]() + strategy.merge_workspace.return_value = _ok_merge() + service = WorkspaceIsolationService( + strategy=strategy, + config=WorkspaceIsolationConfig(), + clock=FakeClock(), + ) + + result = await service.merge_workspace_with_push( + workspace=_workspace(), + project_id=NotBlankStr("proj-1"), + repo_root=Path("/repo/proj-1"), + ) + + assert result.success + strategy.merge_workspace.assert_awaited_once() + + async def test_with_backend_routes_through_queue(self) -> None: + strategy = mock_of[WorkspaceIsolationStrategy]() + strategy.merge_workspace.return_value = _ok_merge() + git_backend = mock_of[GitBackend]() + git_backend.push.return_value = PushResult( + branch=NotBlankStr("main"), + head_sha=NotBlankStr("deadbee"), + ) + service = WorkspaceIsolationService( + strategy=strategy, + config=WorkspaceIsolationConfig(), + git_backend=git_backend, + clock=FakeClock(), + ) + + result = await service.merge_workspace_with_push( + workspace=_workspace(), + project_id=NotBlankStr("proj-1"), + repo_root=Path("/repo/proj-1"), + ) + + assert result.success + git_backend.push.assert_awaited_once() + await service.shutdown() + + async def test_same_project_reuses_one_queue(self) -> None: + strategy = mock_of[WorkspaceIsolationStrategy]() + strategy.merge_workspace.return_value = _ok_merge() + git_backend = mock_of[GitBackend]() + git_backend.push.return_value = PushResult( + branch=NotBlankStr("main"), + head_sha=NotBlankStr("deadbee"), + ) + service = WorkspaceIsolationService( + strategy=strategy, + config=WorkspaceIsolationConfig(), + git_backend=git_backend, + clock=FakeClock(), + ) + + q1 = await service._get_or_create_queue( + project_id=NotBlankStr("proj-1"), + repo_root=Path("/repo/proj-1"), + ) + q2 = await service._get_or_create_queue( + project_id=NotBlankStr("proj-1"), + repo_root=Path("/repo/proj-1"), + ) + assert q1 is q2 + await service.shutdown() diff --git a/tests/unit/integrations/test_connection_types.py b/tests/unit/integrations/test_connection_types.py index b0459e7aca..9785ef5452 100644 --- a/tests/unit/integrations/test_connection_types.py +++ b/tests/unit/integrations/test_connection_types.py @@ -30,6 +30,81 @@ def test_required_fields(self) -> None: assert auth.required_fields() == ("token",) +@pytest.mark.unit +class TestGitLabAuthenticator: + """Tests for GitLab connection validation.""" + + def test_valid_credentials_accepted(self) -> None: + auth = get_authenticator(ConnectionType.GITLAB) + auth.validate_credentials({"token": "glpat-abc123"}) + + def test_missing_token_rejected(self) -> None: + auth = get_authenticator(ConnectionType.GITLAB) + with pytest.raises(InvalidConnectionAuthError, match="token"): + auth.validate_credentials({}) + + def test_empty_token_rejected(self) -> None: + auth = get_authenticator(ConnectionType.GITLAB) + with pytest.raises(InvalidConnectionAuthError, match="token"): + auth.validate_credentials({"token": " "}) + + def test_required_fields(self) -> None: + auth = get_authenticator(ConnectionType.GITLAB) + assert auth.required_fields() == ("token",) + + +@pytest.mark.unit +class TestGiteaForgejoAuthenticators: + """Gitea/Forgejo share base code but report distinct identities.""" + + @pytest.mark.parametrize( + "ct", + [ConnectionType.GITEA, ConnectionType.FORGEJO], + ) + def test_valid_credentials_accepted(self, ct: ConnectionType) -> None: + auth = get_authenticator(ct) + auth.validate_credentials({"token": "tok-123"}) + + @pytest.mark.parametrize( + "ct", + [ConnectionType.GITEA, ConnectionType.FORGEJO], + ) + def test_missing_token_rejected(self, ct: ConnectionType) -> None: + auth = get_authenticator(ct) + with pytest.raises(InvalidConnectionAuthError, match="token"): + auth.validate_credentials({}) + + @pytest.mark.parametrize( + ("ct", "token"), + [ + (ConnectionType.GITEA, ""), + (ConnectionType.GITEA, " "), + (ConnectionType.FORGEJO, ""), + (ConnectionType.FORGEJO, " "), + ], + ) + def test_empty_or_whitespace_token_rejected( + self, ct: ConnectionType, token: str + ) -> None: + """Empty / whitespace-only tokens must reject just like missing ones. + + Keeps token-validation parity with the GitHub and GitLab + authenticators, where the shared validator already rejects + blank strings; the Gitea/Forgejo subclasses share the same + base but the test suite covered only the missing-key case. + """ + auth = get_authenticator(ct) + with pytest.raises(InvalidConnectionAuthError, match="token"): + auth.validate_credentials({"token": token}) + + def test_distinct_connection_type_identity(self) -> None: + gitea = get_authenticator(ConnectionType.GITEA) + forgejo = get_authenticator(ConnectionType.FORGEJO) + assert gitea.connection_type is ConnectionType.GITEA + assert forgejo.connection_type is ConnectionType.FORGEJO + assert type(gitea).__mro__[1] is type(forgejo).__mro__[1] + + @pytest.mark.unit class TestSlackAuthenticator: """Tests for Slack connection validation.""" diff --git a/tests/unit/persistence/test_protocol.py b/tests/unit/persistence/test_protocol.py index 5576ed4ded..4f8760507c 100644 --- a/tests/unit/persistence/test_protocol.py +++ b/tests/unit/persistence/test_protocol.py @@ -38,6 +38,9 @@ PrincipleOverrideRepository, ) from synthorg.persistence.project_protocol import ProjectRepository +from synthorg.persistence.project_workspace_protocol import ( + ProjectWorkspaceRepository, +) from synthorg.persistence.protocol import PersistenceBackend from synthorg.persistence.seen_claims_protocol import SeenClaimsRepository from synthorg.persistence.settings_protocol import SettingsRepository @@ -69,6 +72,7 @@ from synthorg.core.artifact import Artifact from synthorg.core.auth.models import ApiKey, User from synthorg.core.project import Project + from synthorg.core.project_workspace import ProjectWorkspace from synthorg.core.task import Task from synthorg.engine.agent_state import AgentRuntimeState from synthorg.engine.checkpoint.models import Checkpoint, Heartbeat @@ -580,6 +584,28 @@ async def delete(self, entity_id: NotBlankStr) -> bool: return False +class _FakeProjectWorkspaceRepository: + async def save(self, entity: ProjectWorkspace) -> None: + del entity + + async def get(self, entity_id: NotBlankStr) -> ProjectWorkspace | None: + del entity_id + return None + + async def list_items( + self, + *, + limit: int = 100, # lint-allow: magic-numbers -- ADR-0001 + offset: int = 0, + ) -> tuple[ProjectWorkspace, ...]: + del limit, offset + return () + + async def delete(self, entity_id: NotBlankStr) -> bool: + del entity_id + return False + + class _FakeProjectRepository: async def create(self, project: Project) -> None: pass @@ -996,6 +1022,10 @@ def artifacts(self) -> _FakeArtifactRepository: def projects(self) -> _FakeProjectRepository: return _FakeProjectRepository() + @property + def project_workspaces(self) -> _FakeProjectWorkspaceRepository: + return _FakeProjectWorkspaceRepository() + @property def custom_presets(self) -> _FakePersonalityPresetRepository: return _FakePersonalityPresetRepository() @@ -1275,6 +1305,19 @@ def test_fake_artifact_repo_is_artifact_repository(self) -> None: def test_fake_project_repo_is_project_repository(self) -> None: assert isinstance(_FakeProjectRepository(), ProjectRepository) + def test_fake_project_workspace_repo_is_project_workspace_repository( + self, + ) -> None: + # Route through the backend property AND construct the fake + # directly, mirroring the pattern used for other repositories + # so a future drift in either path is caught. + backend = _FakeBackend() + assert isinstance(backend.project_workspaces, ProjectWorkspaceRepository) + assert isinstance( + _FakeProjectWorkspaceRepository(), + ProjectWorkspaceRepository, + ) + def test_fake_ssrf_violation_repo_is_ssrf_violation_repository(self) -> None: # Route through the backend property so the test exercises the # contract-fidelity path (backend.ssrf_violations -> repo) rather diff --git a/web/src/api/types/enum-values.gen.ts b/web/src/api/types/enum-values.gen.ts index fa6171d8b1..0c773557fb 100644 --- a/web/src/api/types/enum-values.gen.ts +++ b/web/src/api/types/enum-values.gen.ts @@ -215,6 +215,9 @@ export type ConnectionStatus = (typeof CONNECTION_STATUS_VALUES)[number] export const CONNECTION_TYPE_VALUES = [ 'github', + 'gitlab', + 'gitea', + 'forgejo', 'slack', 'smtp', 'database', diff --git a/web/src/api/types/error-codes.gen.ts b/web/src/api/types/error-codes.gen.ts index 894bc0a0a2..af534d0e6a 100644 --- a/web/src/api/types/error-codes.gen.ts +++ b/web/src/api/types/error-codes.gen.ts @@ -57,6 +57,7 @@ export const ErrorCode = { BACKUP_UNRESTARTABLE: 4013, AGENT_RUNTIME_NOT_CONFIGURED: 4014, CONVERSATION_CLOSED: 4015, + PROJECT_WORKSPACE_NOT_PROVISIONED: 4016, RATE_LIMITED: 5000, PER_OPERATION_RATE_LIMITED: 5001, CONCURRENCY_LIMIT_EXCEEDED: 5002, diff --git a/web/src/api/types/openapi.gen.ts b/web/src/api/types/openapi.gen.ts index cac2146edb..bae934d1cc 100644 --- a/web/src/api/types/openapi.gen.ts +++ b/web/src/api/types/openapi.gen.ts @@ -6288,7 +6288,7 @@ export type components = { readonly name: string; readonly npm_package: string | null; /** @enum {string|null} */ - readonly required_connection_type: "github" | "slack" | "smtp" | "database" | "generic_http" | "oauth_app" | "a2a_peer" | null; + readonly required_connection_type: "github" | "gitlab" | "gitea" | "forgejo" | "slack" | "smtp" | "database" | "generic_http" | "oauth_app" | "a2a_peer" | null; /** @default [] */ readonly tags: readonly string[]; /** @@ -6693,7 +6693,7 @@ export type components = { * @description Supported external service connection types. * @enum {string} */ - readonly ConnectionType: "github" | "slack" | "smtp" | "database" | "generic_http" | "oauth_app" | "a2a_peer"; + readonly ConnectionType: "github" | "gitlab" | "gitea" | "forgejo" | "slack" | "smtp" | "database" | "generic_http" | "oauth_app" | "a2a_peer"; /** * ContentType * @description Content types available for training extraction. @@ -7585,7 +7585,7 @@ export type components = { * 8xxx = internal. * @enum {integer} */ - readonly ErrorCode: 1000 | 1001 | 1002 | 1003 | 1004 | 1005 | 1006 | 1007 | 1008 | 1009 | 2000 | 2001 | 2002 | 2003 | 2004 | 2005 | 2006 | 2007 | 3000 | 3001 | 3002 | 3003 | 3004 | 3005 | 3006 | 3007 | 3008 | 3009 | 3010 | 3011 | 3012 | 3013 | 3014 | 3015 | 3016 | 3017 | 4000 | 4001 | 4002 | 4003 | 4004 | 4005 | 4006 | 4007 | 4008 | 4009 | 4010 | 4011 | 4012 | 4013 | 4014 | 4015 | 5000 | 5001 | 5002 | 6000 | 6001 | 6002 | 6003 | 6004 | 7000 | 7001 | 7002 | 7003 | 7004 | 7005 | 7006 | 7007 | 7008 | 7009 | 7010 | 8000 | 8001 | 8002 | 8003 | 8004 | 8005 | 8006 | 8007 | 8008 | 8009 | 8010 | 8011 | 8012 | 8013 | 8014 | 8015 | 8016; + readonly ErrorCode: 1000 | 1001 | 1002 | 1003 | 1004 | 1005 | 1006 | 1007 | 1008 | 1009 | 2000 | 2001 | 2002 | 2003 | 2004 | 2005 | 2006 | 2007 | 3000 | 3001 | 3002 | 3003 | 3004 | 3005 | 3006 | 3007 | 3008 | 3009 | 3010 | 3011 | 3012 | 3013 | 3014 | 3015 | 3016 | 3017 | 4000 | 4001 | 4002 | 4003 | 4004 | 4005 | 4006 | 4007 | 4008 | 4009 | 4010 | 4011 | 4012 | 4013 | 4014 | 4015 | 4016 | 5000 | 5001 | 5002 | 6000 | 6001 | 6002 | 6003 | 6004 | 7000 | 7001 | 7002 | 7003 | 7004 | 7005 | 7006 | 7007 | 7008 | 7009 | 7010 | 8000 | 8001 | 8002 | 8003 | 8004 | 8005 | 8006 | 8007 | 8008 | 8009 | 8010 | 8011 | 8012 | 8013 | 8014 | 8015 | 8016; /** ErrorDetail */ readonly ErrorDetail: { readonly detail: string; diff --git a/web/src/pages/connections/connection-type-fields.ts b/web/src/pages/connections/connection-type-fields.ts index d9fd94ef4a..34cf326469 100644 --- a/web/src/pages/connections/connection-type-fields.ts +++ b/web/src/pages/connections/connection-type-fields.ts @@ -45,6 +45,76 @@ export const CONNECTION_TYPE_FIELDS: Record }, ], }, + gitlab: { + label: 'GitLab', + description: 'Access GitLab repositories, issues, and merge requests.', + defaultAuthMethod: 'bearer_token', + topLevelFields: [ + { + key: 'base_url', + label: 'API URL', + type: 'url', + placeholder: 'https://gitlab.com', + required: false, + hint: 'Leave blank for gitlab.com; set for self-hosted', + }, + ], + credentialFields: [ + { + key: 'token', + label: 'Personal Access Token', + type: 'password', + required: true, + placeholder: 'glpat-...', + }, + ], + }, + gitea: { + label: 'Gitea', + description: 'Access Gitea repositories, issues, and pull requests.', + defaultAuthMethod: 'bearer_token', + topLevelFields: [ + { + key: 'base_url', + label: 'API URL', + type: 'url', + placeholder: 'https://gitea.example.com', + required: true, + hint: 'Self-hosted Gitea instance URL', + }, + ], + credentialFields: [ + { + key: 'token', + label: 'Personal Access Token', + type: 'password', + required: true, + }, + ], + }, + forgejo: { + label: 'Forgejo', + description: 'Access Forgejo repositories, issues, and pull requests.', + defaultAuthMethod: 'bearer_token', + topLevelFields: [ + { + key: 'base_url', + label: 'API URL', + type: 'url', + placeholder: 'https://forgejo.example.com', + required: true, + hint: 'Self-hosted Forgejo instance URL (e.g. codeberg.org)', + }, + ], + credentialFields: [ + { + key: 'token', + label: 'Personal Access Token', + type: 'password', + required: true, + }, + ], + }, slack: { label: 'Slack', description: 'Send messages and manage channels via Slack.',