Skip to content
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions .claude/skills/aurelio-review-pr/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -317,11 +317,11 @@ Show the user the complete table, organized by severity (Critical first, Minor l

- Total count of items
- Count by source (each agent + each external reviewer)
- Any items you recommend skipping (with reasoning)
**Default behavior: implement ALL valid findings** — including pre-existing issues found in surrounding code, suggestions, and anything correctly identified by agents or external reviewers. Do NOT skip items just because they are "pre-existing" or "out of scope" — if a reviewer found a valid issue in code touched by (or adjacent to) this PR, fix it now.

Then ask the user using AskUserQuestion with options like:

- "Implement all" (Recommended)
- "Implement all (Recommended)" — this is the default
- "Let me review the list first"
- "Skip some items"

Expand Down Expand Up @@ -411,4 +411,4 @@ Report what was done:
- If two sources contradict each other, flag it and ask the user.
- Do NOT use `--no-verify` or `--amend` for commits.
- External feedback fetch failures are non-fatal — retry once, then proceed with local findings if still failing. Mark external coverage as partial in the triage table.
- **Fix everything in the current PR — never defer.** Every valid recommendation must be implemented in this PR regardless of size. No creating GitHub issues for "too large" items, no deferring to future PRs, no marking things as out of scope. If a reviewer flags it and it's valid, fix it now — docstrings, type hints, refactors, all of it.
- **Fix everything valid — never defer, never skip.** Every valid recommendation must be implemented — including pre-existing issues, suggestions, and findings in surrounding code. No creating GitHub issues for "too large" items, no deferring to future PRs, no marking things as "out of scope". If a reviewer flags it and it's valid, fix it now.
7 changes: 4 additions & 3 deletions .claude/skills/pre-pr-review/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -290,10 +290,11 @@ Sort by severity (Critical first, Minor last).
Show the consolidated table with:
- Total count of items
- Count by source agent
- Any items recommended to skip (with reasoning)

**Default behavior: implement ALL valid findings** — including pre-existing issues found in surrounding code, suggestions, and anything the agents correctly identified. Do NOT skip items just because they are "pre-existing" or "out of scope" — if an agent found a valid issue in code touched by (or adjacent to) this PR, fix it now.

Ask the user via AskUserQuestion:
- "Implement all" (Recommended)
- "Implement all (Recommended)" — this is the default
- "Let me review the list first"
- "Skip some items"

Expand Down Expand Up @@ -395,4 +396,4 @@ Report:
- If two agents contradict each other, flag it and ask the user.
- Do NOT use `--no-verify` or `--amend` for commits.
- Agent failures are non-fatal — proceed with available findings, report failed agents.
- **Fix everything that's approved — never defer.** Every valid recommendation must be implemented. No creating GitHub issues for "too large" items, no deferring to future PRs.
- **Fix everything valid — never defer, never skip.** Every valid recommendation must be implemented — including pre-existing issues, suggestions, and findings in surrounding code. No creating GitHub issues for "too large" items, no deferring to future PRs, no marking things as "out of scope".
1 change: 1 addition & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -129,6 +129,7 @@ src/ai_company/
- For trivial/docs-only changes: `/pre-pr-review quick` skips agents but still runs automated checks
- After the PR exists, use `/aurelio-review-pr` to handle external reviewer feedback
- The `/commit-push-pr` command is effectively blocked (it calls `gh pr create` internally)
- **Fix everything valid — never skip**: When review agents find valid issues (including pre-existing issues in surrounding code, suggestions, and findings adjacent to the PR's changes), fix them all. No deferring, no "out of scope" skipping.

## CI

Expand Down
5 changes: 3 additions & 2 deletions DESIGN_SPEC.md
Original file line number Diff line number Diff line change
Expand Up @@ -1682,7 +1682,7 @@ When the LLM requests multiple tool calls in a single turn, `ToolInvoker.invoke_

The `ToolPermissionChecker` resolves permissions using a priority-based system: denied list (highest) → allowed list → access-level categories → deny (default). `AgentEngine._make_tool_invoker()` creates a permission-aware invoker from the agent's `ToolPermissions` at the start of each `run()` call. Note: M3 implements category-level gating only; the granular sub-constraints described in §11.2 (workspace scope, network mode) are planned for when sandboxing is implemented.

> **M3 implementation note — Built-in git tools:** Six workspace-scoped git tools are implemented in `tools/git_tools.py` with a shared `_BaseGitTool` base class in `tools/_git_base.py`: `GitStatusTool`, `GitLogTool`, `GitDiffTool`, `GitBranchTool`, `GitCommitTool`, and `GitCloneTool`. The base class enforces workspace boundary security (path traversal prevention via `resolve()` + `relative_to()`) and provides a common `_run_git()` helper using `asyncio.create_subprocess_exec` (never `shell=True`). Security hardening includes: `GIT_TERMINAL_PROMPT=0` to prevent credential prompts, `GIT_CONFIG_NOSYSTEM=1`, `GIT_CONFIG_GLOBAL=os.devnull`, and `GIT_PROTOCOL_FROM_USER=0` to restrict config/protocol attack surfaces, rejection of flag-like ref/branch values (starting with `-`), URL scheme validation on clone (only `https://`, `ssh://`, `git://`, and SCP-like syntax — plain `http://` rejected for security) with `--` separator before positional URL argument, and clone URLs starting with `-` are rejected. All tools return `ToolExecutionResult` for errors rather than raising exceptions. When a `SandboxBackend` is injected, `_run_git()` delegates subprocess management to the sandbox via `_run_git_sandboxed()` — the sandbox handles environment filtering and workspace-scoped cwd enforcement, while `_validate_path` independently enforces workspace boundaries for git path arguments. Git hardening env vars are passed as `env_overrides` to the sandbox, and `SandboxResult` is converted to `ToolExecutionResult` via `_sandbox_result_to_execution_result`. Without a sandbox, the direct-subprocess path is used (backward compatible). **Future:** Consider adding host/IP allowlisting for clone URLs to prevent SSRF against internal networks (loopback, link-local, private ranges).
> **M3 implementation note — Built-in git tools:** Six workspace-scoped git tools are implemented in `tools/git_tools.py` with a shared `_BaseGitTool` base class in `tools/_git_base.py`: `GitStatusTool`, `GitLogTool`, `GitDiffTool`, `GitBranchTool`, `GitCommitTool`, and `GitCloneTool`. The base class enforces workspace boundary security (path traversal prevention via `resolve()` + `relative_to()`) and provides a common `_run_git()` helper using `asyncio.create_subprocess_exec` (never `shell=True`). Security hardening includes: `GIT_TERMINAL_PROMPT=0` to prevent credential prompts, `GIT_CONFIG_NOSYSTEM=1`, `GIT_CONFIG_GLOBAL=os.devnull`, and `GIT_PROTOCOL_FROM_USER=0` to restrict config/protocol attack surfaces, rejection of flag-like argument values (starting with `-`) for refs, branch names, author filters, date strings, and other git arguments, URL scheme validation on clone (only `https://`, `ssh://`, `git://`, and SCP-like syntax — plain `http://` rejected for security) with `--` separator before positional URL argument, and clone URLs starting with `-` are rejected. All tools return `ToolExecutionResult` for errors rather than raising exceptions. When a `SandboxBackend` is injected, `_run_git()` delegates subprocess management to the sandbox via `_run_git_sandboxed()` — the sandbox handles environment filtering and workspace-scoped cwd enforcement, while `_validate_path` independently enforces workspace boundaries for git path arguments. Git hardening env vars are passed as `env_overrides` to the sandbox, and `SandboxResult` is converted to `ToolExecutionResult` via `_sandbox_result_to_execution_result`. Without a sandbox, the direct-subprocess path is used (backward compatible). Both paths explicitly close the subprocess transport on Windows (via `tools/_process_cleanup.py`) to prevent `ResourceWarning` on `ProactorEventLoop`. **Future:** Consider adding host/IP allowlisting for clone URLs to prevent SSRF against internal networks (loopback, link-local, private ranges).

### 11.1.2 Tool Sandboxing

Expand All @@ -1694,7 +1694,7 @@ Tool execution requires safety boundaries proportional to the risk of each tool

| Backend | Isolation | Latency | Dependencies | Status |
|---------|-----------|---------|--------------|--------|
| `SubprocessSandbox` | Process-level: env filtering (allowlist + denylist), restricted PATH, workspace-scoped cwd, timeout + process-group kill, library injection var blocking | ~ms | None | **Implemented** |
| `SubprocessSandbox` | Process-level: env filtering (allowlist + denylist), restricted PATH (configurable via `extra_safe_path_prefixes`), workspace-scoped cwd, timeout + process-group kill, library injection var blocking, explicit transport cleanup on Windows | ~ms | None | **Implemented** |
| `DockerSandbox` | Container-level: ephemeral container, mounted workspace, no network, resource limits (CPU/memory/time) | ~1-2s cold start | Docker | Planned |
| `K8sSandbox` | Pod-level: per-agent containers, namespace isolation, resource quotas, network policies | ~2-5s | Kubernetes | Future |

Expand Down Expand Up @@ -2398,6 +2398,7 @@ ai-company/
│ │ │ ├── read_file.py # ReadFileTool
│ │ │ └── write_file.py # WriteFileTool
│ │ ├── _git_base.py # Base class for git tools (workspace, subprocess, sandbox integration)
│ │ ├── _process_cleanup.py # Subprocess transport cleanup utility (Windows ResourceWarning prevention)
│ │ ├── git_tools.py # Git operations — 6 built-in tools (sandbox-aware)
│ │ ├── code_runner.py # Code execution (M3)
│ │ ├── web_tools.py # HTTP, search (M3)
Expand Down
6 changes: 6 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,12 @@ AI Company lets you spin up a virtual organization staffed entirely by AI agents
- **Vue 3** for web dashboard (planned)
- **SQLite** → PostgreSQL for data persistence (planned)

## System Requirements

- **Python 3.14+**
- **uv** — package manager ([install](https://docs.astral.sh/uv/getting-started/installation/))
- **Git 2.x+** — required at runtime for built-in git tools (subprocess-based, not a Python binding)

## Getting Started

```bash
Expand Down
69 changes: 55 additions & 14 deletions src/ai_company/engine/shutdown.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,18 +14,17 @@
import signal
import sys
import time
import types # noqa: TC003 — used in runtime-visible annotation
from collections.abc import Callable, Coroutine, Mapping, Sequence
from typing import TYPE_CHECKING, Any, Protocol, runtime_checkable

if TYPE_CHECKING:
import types
from typing import Any, Protocol, runtime_checkable

from pydantic import BaseModel, ConfigDict, Field

from ai_company.core.types import NotBlankStr # noqa: TC001
from ai_company.observability import get_logger
from ai_company.observability.events.execution import (
EXECUTION_SHUTDOWN_CLEANUP,
EXECUTION_SHUTDOWN_CLEANUP_FAILED,
EXECUTION_SHUTDOWN_CLEANUP_TIMEOUT,
EXECUTION_SHUTDOWN_COMPLETE,
EXECUTION_SHUTDOWN_FORCE_CANCEL,
Expand Down Expand Up @@ -243,13 +242,41 @@ async def _wait_and_cancel(
pending,
timeout=self._CANCEL_PROPAGATION_TIMEOUT,
)
for task in cancel_done:
if not task.cancelled():
with contextlib.suppress(Exception):
task.exception()
self._log_post_cancel_exceptions(cancel_done)

return tasks_completed, len(pending)

def _log_post_cancel_exceptions(
self,
tasks: set[asyncio.Task[Any]],
) -> None:
"""Retrieve and log exceptions from post-cancel tasks.

Retrieving the exception prevents asyncio's "Task exception was
never retrieved" warning. Non-cancelled tasks with exceptions
are logged at DEBUG.
"""
for task in tasks:
if task.cancelled():
continue
try:
exc = task.exception()
except asyncio.InvalidStateError:
logger.debug(
EXECUTION_SHUTDOWN_TASK_ERROR,
error="Failed to inspect post-cancel task: InvalidStateError",
task_name=task.get_name(),
)
else:
if exc is not None:
logger.debug(
EXECUTION_SHUTDOWN_TASK_ERROR,
error=(
f"Post-cancel task exception: {type(exc).__name__}: {exc}"
),
task_name=task.get_name(),
)

async def _run_cleanup(
self,
callbacks: Sequence[CleanupCallback],
Expand Down Expand Up @@ -279,10 +306,9 @@ async def _run_all() -> None:
except Exception:
all_succeeded = False
logger.exception(
EXECUTION_SHUTDOWN_CLEANUP,
EXECUTION_SHUTDOWN_CLEANUP_FAILED,
callback_index=i,
callback_count=len(callbacks),
error="Cleanup callback failed",
)

try:
Expand Down Expand Up @@ -358,8 +384,12 @@ def _handle_signal(self, sig: signal.Signals) -> None:
logger.exception(
EXECUTION_SHUTDOWN_SIGNAL,
signal=sig.name,
error="request_shutdown() raised in signal handler",
error="request_shutdown() raised — falling back to loop.stop()",
)
# If request_shutdown() itself fails, stop the event loop as
# a last resort to avoid a process that ignores signals.
with contextlib.suppress(Exception):
asyncio.get_running_loop().stop()

def _handle_signal_threadsafe(
self,
Expand Down Expand Up @@ -387,17 +417,28 @@ def _on_loop() -> None:
logger.exception(
EXECUTION_SHUTDOWN_SIGNAL,
signal=sig_name,
error="request_shutdown() raised in signal handler",
error="request_shutdown() raised — falling back to loop.stop()",
)
with contextlib.suppress(Exception):
asyncio.get_running_loop().stop()

try:
loop = asyncio.get_running_loop()
loop.call_soon_threadsafe(_on_loop)
except RuntimeError:
# No running event loop — call directly (best-effort).
# Cannot log safely without a loop, so suppress all errors.
with contextlib.suppress(Exception):
# Cannot use structlog (acquires locks) so fall back to
# stderr for last-resort visibility.
try:
self._strategy.request_shutdown()
except Exception:
try:
sys.stderr.write(
f"[shutdown] request_shutdown() failed for signal {sig_name}\n"
)
sys.stderr.flush()
except Exception: # noqa: S110
pass

def register_task(
self,
Expand Down
1 change: 1 addition & 0 deletions src/ai_company/observability/events/execution.py
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,7 @@
EXECUTION_SHUTDOWN_GRACE_START: Final[str] = "execution.shutdown.grace_start"
EXECUTION_SHUTDOWN_FORCE_CANCEL: Final[str] = "execution.shutdown.force_cancel"
EXECUTION_SHUTDOWN_CLEANUP: Final[str] = "execution.shutdown.cleanup"
EXECUTION_SHUTDOWN_CLEANUP_FAILED: Final[str] = "execution.shutdown.cleanup.failed"
EXECUTION_SHUTDOWN_CLEANUP_TIMEOUT: Final[str] = "execution.shutdown.cleanup.timeout"
EXECUTION_SHUTDOWN_COMPLETE: Final[str] = "execution.shutdown.complete"
EXECUTION_LOOP_SHUTDOWN: Final[str] = "execution.loop.shutdown"
Expand Down
1 change: 1 addition & 0 deletions src/ai_company/observability/events/sandbox.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,3 +13,4 @@
SANDBOX_PATH_FALLBACK: Final[str] = "sandbox.path.fallback"
SANDBOX_HEALTH_CHECK: Final[str] = "sandbox.health_check"
SANDBOX_KILL_FAILED: Final[str] = "sandbox.kill.failed"
SANDBOX_KILL_FALLBACK: Final[str] = "sandbox.kill.fallback"
5 changes: 5 additions & 0 deletions src/ai_company/observability/events/tool.py
Original file line number Diff line number Diff line change
Expand Up @@ -39,3 +39,8 @@
TOOL_FS_PARENT_NOT_FOUND: Final[str] = "tool.fs.parent_not_found"
TOOL_FS_GLOB_REJECTED: Final[str] = "tool.fs.glob_rejected"
TOOL_FS_NOOP: Final[str] = "tool.fs.noop"

# ── Subprocess utility events ───────────────────────────────────
TOOL_SUBPROCESS_TRANSPORT_CLOSE_FAILED: Final[str] = (
"tool.subprocess.transport_close_failed"
)
Loading