Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,7 @@ curl http://localhost:3000/api/v1/health # backend (via web proxy)

```text
src/ai_company/
api/ # Litestar REST + WebSocket API (controllers, guards, channels)
api/ # Litestar REST + WebSocket API (controllers, guards, channels, JWT + API key auth)
budget/ # Cost tracking, budget enforcement (pre-flight/in-flight checks, auto-downgrade), billing periods, cost tiers, quota/subscription tracking, CFO cost optimization (anomaly detection, efficiency analysis, downgrade recommendations, approval decisions), spending reports, budget errors (BudgetExhaustedError, DailyLimitExceededError, QuotaExhaustedError)
cli/ # CLI interface (future — thin API wrapper if needed)
communication/ # Message bus, dispatcher, messenger, channels, delegation, loop prevention, conflict resolution, meeting protocol
Expand Down
26 changes: 18 additions & 8 deletions DESIGN_SPEC.md
Original file line number Diff line number Diff line change
Expand Up @@ -81,7 +81,7 @@ The MVP validates the core hypothesis: **a single agent can complete a real task

> **Implementation snapshot (2026-03-10):**
> - **Done:** M0–M6 (tooling, config/core, providers, single-agent engine, multi-agent orchestration, API/CLI surface) + Docker sandbox (#50), MCP bridge (#53), code runner + HR engine (hiring/firing/onboarding/offboarding/registry) + performance tracking (task metrics, quality scoring, collaboration scoring, trend detection, rolling windows). Memory layer backend selected ([ADR-001](docs/decisions/ADR-001-memory-layer.md)). Persistence backend (§7.6) completed (including audit entry persistence via AuditRepository + SQLite backend). Memory retrieval pipeline (#41: ranking, token-budget formatting, context injection, non-inferable filtering) complete. Budget enforcement complete (BudgetEnforcer + configurable cost tiers + quota/subscription tracking). CFO cost optimization complete (CostOptimizer: anomaly detection, efficiency analysis, downgrade recommendations, routing optimization, approval decisions; ReportGenerator: multi-dimensional spending reports). Shared org memory (#125: HybridPromptRetrievalBackend, OrgFactStore, access control, factory) complete. Memory consolidation/archival (#48: ConsolidationService, SimpleConsolidationStrategy, RetentionEnforcer, ArchivalStore protocol) complete. SecOps agent (rule engine, audit log, output scanner, output scan response policies (redact/withhold/log-only/autonomy-tiered), risk classifier, ToolInvoker integration), progressive trust (4 strategies: disabled/weighted/per-category/milestone behind TrustStrategy protocol), promotion/demotion (criteria evaluation, approval strategies, model mapping). Autonomy levels (#42: AutonomyLevel enum, presets, 3-level resolver, rule-based auto-downgrade/human-only promotion change strategy) + approval timeout policies (#126: 4 timeout policies, park/resume service, risk tier classifier, timeout checker) complete.
> - **Remaining:** JWT/OAuth auth, approval workflow gates.
> - **Remaining:** Approval workflow gates.

### 1.5 Configuration Philosophy

Expand Down Expand Up @@ -2562,6 +2562,7 @@ The REST/WebSocket API is the **primary interface** for all consumers. The Web U
```text
/api/v1/
├── /health # Health check, readiness
├── /auth # Authentication: setup, login, password change, me
├── /company # CRUD company config
├── /agents # List, hire, fire, modify agents
├── /departments # Department management
Expand Down Expand Up @@ -2758,6 +2759,7 @@ Circular inheritance is detected via chain tracking and raises `TemplateInherita
| **Docker API** | aiodocker | Async-native Docker API client for `DockerSandbox` backend |
| **Tool Integration** | MCP SDK (`mcp`) | Industry standard for LLM-to-tool integration |
| **Agent Comms** | A2A Protocol compatible | Future-proof inter-agent communication |
| **Authentication** | PyJWT + argon2-cffi | JWT (HMAC HS256/384/512) for session tokens, Argon2id for password hashing, SHA-256 for API key storage |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Consider impact of mandatory authentication on "Local First" design principle.

The PR introduces mandatory JWT + API key authentication. While this is excellent for security in networked deployments, it may conflict with the "Local First" design principle (§1.2):

"Local First — Runs locally with option to expose on network or host remotely later"

Mandatory authentication adds friction for local single-user scenarios:

  • Users must complete password setup on first run
  • No quick "just try it" experience for local evaluation
  • Local-only deployments don't inherently need multi-user auth

Industry precedent: Docker Desktop, PostgreSQL, Redis, and many developer tools run locally without authentication by default, only requiring it when exposed on a network.

Possible mitigations:

  1. Require auth only when binding to non-localhost addresses (bind address detection)
  2. Add a development mode flag (--no-auth or AI_COMPANY_DEV_MODE=1) with clear security warnings
  3. Auto-login for localhost-only deployments (check if Host header is localhost/127.0.0.1)

This is a design trade-off between security-by-default and developer experience. Please confirm this trade-off is intentional, or consider adding an escape hatch for local development.

Based on learnings: This implementation changes the UX significantly for local-first usage. If this deviates from the original vision, user approval may be needed.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@DESIGN_SPEC.md` at line 2762, The design enforces mandatory JWT/API-key auth
which harms local-first UX; add a guarded bypass so local single-user runs can
skip auth: introduce a config flag (e.g., AUTH_REQUIRED default true) and an
explicit dev override (e.g., CLI --no-auth or env AI_COMPANY_DEV_MODE=true) and
amend the central auth path (the auth middleware / function used for all
requests, e.g., authMiddleware or validateAuth) to short-circuit authentication
when AUTH_REQUIRED is false OR when server bind address is localhost/127.0.0.1
(detect via the server.bindAddress or incoming Host header) and log a strong
security warning; keep auth enforced for any non-local bind or when the override
is not set.

| **Config Format** | YAML + Pydantic validation | Human-readable config with strict validation |
| **CLI** | TBD (future, if needed) | Thin wrapper around the REST API for terminal use. May not be needed — interactive Scalar docs at `/docs/api` and `curl`/`httpie` may suffice |

Expand Down Expand Up @@ -2980,7 +2982,7 @@ ai-company/
│ ├── persistence/ # Operational data persistence (§7.6)
│ │ ├── __init__.py # Package exports
│ │ ├── protocol.py # PersistenceBackend protocol (M5)
│ │ ├── repositories.py # Repository protocols: TaskRepository, CostRecordRepository, MessageRepository, ParkedContextRepository, AuditRepository
│ │ ├── repositories.py # Repository protocols: TaskRepository, CostRecordRepository, MessageRepository, ParkedContextRepository, AuditRepository, UserRepository, ApiKeyRepository
│ │ ├── config.py # PersistenceConfig model (M5)
│ │ ├── errors.py # Persistence error hierarchy (M5)
│ │ ├── factory.py # create_backend() factory (M5)
Expand All @@ -2991,7 +2993,8 @@ ai-company/
│ │ ├── hr_repositories.py # SQLite HR repositories (LifecycleEvent, TaskMetricRecord, CollaborationMetricRecord)
│ │ ├── parked_context_repo.py # SQLiteParkedContextRepository (park/resume serialized agent state)
│ │ ├── audit_repository.py # SQLiteAuditRepository (append-only audit entry persistence)
│ │ └── migrations.py # Schema migrations (user_version pragma)
│ │ ├── user_repo.py # SQLiteUserRepository + SQLiteApiKeyRepository
│ │ └── migrations.py # Schema migrations (user_version pragma, v1–v5)
│ ├── observability/ # Structured logging & correlation
│ │ ├── __init__.py # get_logger() entry point
│ │ ├── _logger.py # Logger configuration
Expand Down Expand Up @@ -3183,18 +3186,25 @@ ai-company/
│ ├── api/ # REST + WebSocket API (M6)
│ │ ├── app.py # Litestar application factory, lifecycle hooks
│ │ ├── approval_store.py # In-memory approval queue storage
│ │ ├── auth/ # JWT + API key authentication subsystem
│ │ │ ├── config.py # AuthConfig (frozen Pydantic, HMAC algorithm, exclude paths)
│ │ │ ├── controller.py # AuthController (setup, login, change-password, me)
│ │ │ ├── middleware.py # ApiAuthMiddleware (JWT-first, API key fallback)
│ │ │ ├── models.py # User, ApiKey, AuthenticatedUser, AuthMethod
│ │ │ ├── secret.py # JWT secret resolution (env var → persistence → auto-generate)
│ │ │ └── service.py # AuthService (Argon2id password hashing, JWT ops, API key hashing)
│ │ ├── bus_bridge.py # Message-bus → WebSocket bridge
│ │ ├── channels.py # WebSocket channel definitions
│ │ ├── config.py # API configuration models (ServerConfig, CorsConfig)
│ │ ├── controllers/ # 14 class-based controllers + 1 WebSocket handler (15 route modules)
│ │ ├── controllers/ # 15 class-based controllers + 1 WebSocket handler (16 route modules)
│ │ ├── dto.py # Request/response DTOs and envelopes
│ │ ├── errors.py # API error hierarchy (ApiError, NotFoundError, etc.)
│ │ ├── errors.py # API error hierarchy (ApiError, NotFoundError, UnauthorizedError, etc.)
│ │ ├── exception_handlers.py # Litestar exception handler registration
│ │ ├── guards.py # Route guards — read/write access (stub auth, M7 real auth)
│ │ ├── middleware.py # Request logging middleware
│ │ ├── guards.py # Route guards — role-based read/write access control (HumanRole enum)
│ │ ├── middleware.py # Request logging, CSP middleware
│ │ ├── pagination.py # Cursor-free offset/limit pagination
│ │ ├── server.py # Uvicorn server runner
│ │ ├── state.py # Typed AppState container with service access
│ │ ├── state.py # Typed AppState container with service access (deferred auth init)
│ │ └── ws_models.py # WebSocket event models (WsEvent, WsEventType)
│ ├── cli/ # CLI interface (future, if needed)
│ │ ├── __init__.py
Expand Down
9 changes: 5 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,10 +24,11 @@ AI Company lets you spin up a virtual organization staffed entirely by AI agents
- **Memory Interface (M5)** - Pluggable `MemoryBackend` protocol with capability discovery, shared knowledge protocol, domain models, config, factory, and context injection retrieval pipeline (ranking, token-budget formatting, non-inferable filtering). Shared organizational memory via `OrgMemoryBackend` protocol with hybrid prompt+retrieval backend. Memory consolidation/archival with pluggable strategies and retention enforcement
- **Coordination Error Taxonomy (M5)** - Post-execution classification pipeline detecting logical contradictions, numerical drift, context omissions, and coordination failures
- **Budget Enforcement (M5)** - `BudgetEnforcer` service with pre-flight checks, in-flight budget checking, auto-downgrade, configurable cost tiers, and quota/subscription tracking; `CostOptimizer` CFO service with anomaly detection, efficiency analysis, downgrade recommendations, and approval decisions; `ReportGenerator` for multi-dimensional spending reports
- **Litestar REST API (M6)** - 13 controllers + WebSocket handler covering company, agents, tasks, budget, approvals, analytics, messages, meetings, projects, departments, artifacts, providers, health, and WebSocket real-time feed
- **Litestar REST API (M6)** - 15 controllers + WebSocket handler covering company, agents, tasks, budget, approvals, analytics, messages, meetings, projects, departments, artifacts, providers, health, auth, and WebSocket real-time feed
- **Human Approval Queue (M6)** - Approval submission, approve/reject with reason, list/filter by status, WebSocket notifications for approval events
- **WebSocket Real-Time Feed (M6)** - Channel-based subscriptions (tasks, agents, budget, messages, system, approvals), per-channel payload filters, message-bus bridge
- **Route Guards (M6)** - Role-based read/write access control (stub auth for M6; real JWT/OAuth planned for M7)
- **Route Guards (M6)** - Role-based read/write access control with 5 human roles (CEO, Manager, Board Member, Pair Programmer, Observer)
- **JWT + API Key Authentication (M7)** - Mandatory auth middleware (JWT-first with API key fallback), Argon2id password hashing, first-run admin setup, password change flow, SHA-256 API key hashing, regex-based path exclusions
- **HR Engine (M7)** - Hiring pipeline (request → generate candidate → approval → instantiate), onboarding checklists, offboarding pipeline (reassign → archive → notify → terminate), agent registry
- **Performance Tracking (M7)** - Task metrics, CI-based quality scoring, behavioral collaboration scoring, Theil-Sen robust trend detection, multi-window rolling metric aggregation
- **Progressive Trust (M7)** - 4 strategies (disabled/weighted/per-category/milestone) behind pluggable `TrustStrategy` protocol, trust level tracking, action permission evaluation
Expand All @@ -38,12 +39,12 @@ AI Company lets you spin up a virtual organization staffed entirely by AI agents

- **Memory Backend Adapter (M5)** - Memory protocols, retrieval pipeline, org memory, and consolidation are complete; initial Mem0 adapter backend ([ADR-001](docs/decisions/ADR-001-memory-layer.md)) pending; research backends (GraphRAG, Temporal KG) planned
- **CLI Surface** - `cli/` package is placeholder-only
- **Security/Approval System (M7)** - Real authentication (JWT/OAuth) and approval workflow gates are planned
- **Security/Approval System (M7)** - SecOps agent with rule engine (soft-allow/hard-deny, fail-closed), audit log, output scanner, risk classifier, and ToolInvoker integration are implemented; progressive trust (4 strategies), promotion/demotion, autonomy levels (5 tiers with presets, resolver, change strategies) and approval timeout policies (wait-forever, auto-deny, tiered, escalation-chain with task park/resume) are implemented; JWT + API key authentication is implemented; approval workflow gates remain planned
- **Advanced Product Surface** - web dashboard, external integrations

## Status

**M7: Security & Approval** partially complete — Docker sandbox, MCP bridge, code runner, SecOps agent, HR engine + performance tracking, progressive trust, promotion/demotion done; authentication/approval workflow gates remain. See [DESIGN_SPEC.md](DESIGN_SPEC.md) for the full high-level specification.
**M7: Security & Approval** partially complete — Docker sandbox, MCP bridge, code runner, SecOps agent, HR engine + performance tracking, progressive trust, promotion/demotion, JWT + API key authentication done; approval workflow gates remain. See [DESIGN_SPEC.md](DESIGN_SPEC.md) for the full high-level specification.

## Tech Stack

Expand Down
8 changes: 8 additions & 0 deletions docker/.env.example
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,14 @@
# API key for the LLM provider (required for agent execution)
LLM_API_KEY=

# --- Authentication ----------------------------------------------------------
# JWT signing secret (optional — auto-generated and persisted on first run).
# Set explicitly only for multi-instance deployments sharing a common secret.
# Must be >= 32 characters if set.
# Generate with: python -c "import secrets; print(secrets.token_urlsafe(48))"
# AI_COMPANY_JWT_SECRET=
# First-run: POST /api/v1/auth/setup to create admin account

# --- Application -------------------------------------------------------------
# Log level: debug, info, warning, error, critical
AI_COMPANY_LOG_LEVEL=info
Expand Down
2 changes: 2 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -15,12 +15,14 @@ classifiers = [
dependencies = [
"aiodocker==0.26.0",
"aiosqlite==0.22.1",
"argon2-cffi==25.1.0",
"jinja2==3.1.6",
"jsonschema==4.26.0",
"litellm==1.82.1",
"litestar[standard,structlog,pydantic,brotli,prometheus]==2.21.1",
"mcp==1.26.0",
"pydantic==2.12.5",
"pyjwt[crypto]==2.11.0",
"pyyaml==6.0.3",
"structlog==25.5.0",
]
Expand Down
Loading
Loading