Skip to content
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,7 @@ curl http://localhost:3000/api/v1/health # backend (via web proxy)

```text
src/ai_company/
api/ # Litestar REST + WebSocket API (controllers, guards, channels)
api/ # Litestar REST + WebSocket API (controllers, guards, channels, JWT + API key auth)
budget/ # Cost tracking, budget enforcement (pre-flight/in-flight checks, auto-downgrade), billing periods, cost tiers, quota/subscription tracking, CFO cost optimization (anomaly detection, efficiency analysis, downgrade recommendations, approval decisions), spending reports, budget errors (BudgetExhaustedError, DailyLimitExceededError, QuotaExhaustedError)
cli/ # CLI interface (future — thin API wrapper if needed)
communication/ # Message bus, dispatcher, messenger, channels, delegation, loop prevention, conflict resolution, meeting protocol
Expand Down
26 changes: 18 additions & 8 deletions DESIGN_SPEC.md
Original file line number Diff line number Diff line change
Expand Up @@ -81,7 +81,7 @@ The MVP validates the core hypothesis: **a single agent can complete a real task

> **Implementation snapshot (2026-03-10):**
> - **Done:** M0–M6 (tooling, config/core, providers, single-agent engine, multi-agent orchestration, API/CLI surface) + Docker sandbox (#50), MCP bridge (#53), code runner + HR engine (hiring/firing/onboarding/offboarding/registry) + performance tracking (task metrics, quality scoring, collaboration scoring, trend detection, rolling windows). Memory layer backend selected ([ADR-001](docs/decisions/ADR-001-memory-layer.md)). Persistence backend (§7.6) completed (including audit entry persistence via AuditRepository + SQLite backend). Memory retrieval pipeline (#41: ranking, token-budget formatting, context injection, non-inferable filtering) complete. Budget enforcement complete (BudgetEnforcer + configurable cost tiers + quota/subscription tracking). CFO cost optimization complete (CostOptimizer: anomaly detection, efficiency analysis, downgrade recommendations, routing optimization, approval decisions; ReportGenerator: multi-dimensional spending reports). Shared org memory (#125: HybridPromptRetrievalBackend, OrgFactStore, access control, factory) complete. Memory consolidation/archival (#48: ConsolidationService, SimpleConsolidationStrategy, RetentionEnforcer, ArchivalStore protocol) complete. SecOps agent (rule engine, audit log, output scanner, output scan response policies (redact/withhold/log-only/autonomy-tiered), risk classifier, ToolInvoker integration), progressive trust (4 strategies: disabled/weighted/per-category/milestone behind TrustStrategy protocol), promotion/demotion (criteria evaluation, approval strategies, model mapping). Autonomy levels (#42: AutonomyLevel enum, presets, 3-level resolver, rule-based auto-downgrade/human-only promotion change strategy) + approval timeout policies (#126: 4 timeout policies, park/resume service, risk tier classifier, timeout checker) complete.
> - **Remaining:** JWT/OAuth auth, approval workflow gates.
> - **Remaining:** Approval workflow gates.

### 1.5 Configuration Philosophy

Expand Down Expand Up @@ -2562,6 +2562,7 @@ The REST/WebSocket API is the **primary interface** for all consumers. The Web U
```text
/api/v1/
├── /health # Health check, readiness
├── /auth # Authentication: setup, login, password change, me
├── /company # CRUD company config
├── /agents # List, hire, fire, modify agents
├── /departments # Department management
Expand Down Expand Up @@ -2758,6 +2759,7 @@ Circular inheritance is detected via chain tracking and raises `TemplateInherita
| **Docker API** | aiodocker | Async-native Docker API client for `DockerSandbox` backend |
| **Tool Integration** | MCP SDK (`mcp`) | Industry standard for LLM-to-tool integration |
| **Agent Comms** | A2A Protocol compatible | Future-proof inter-agent communication |
| **Authentication** | PyJWT + argon2-cffi | JWT (HMAC HS256/384/512) for session tokens, Argon2id for password hashing, SHA-256 for API key storage |

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Consider impact of mandatory authentication on "Local First" design principle.

The PR introduces mandatory JWT + API key authentication. While this is excellent for security in networked deployments, it may conflict with the "Local First" design principle (§1.2):

"Local First — Runs locally with option to expose on network or host remotely later"

Mandatory authentication adds friction for local single-user scenarios:

  • Users must complete password setup on first run
  • No quick "just try it" experience for local evaluation
  • Local-only deployments don't inherently need multi-user auth

Industry precedent: Docker Desktop, PostgreSQL, Redis, and many developer tools run locally without authentication by default, only requiring it when exposed on a network.

Possible mitigations:

  1. Require auth only when binding to non-localhost addresses (bind address detection)
  2. Add a development mode flag (--no-auth or AI_COMPANY_DEV_MODE=1) with clear security warnings
  3. Auto-login for localhost-only deployments (check if Host header is localhost/127.0.0.1)

This is a design trade-off between security-by-default and developer experience. Please confirm this trade-off is intentional, or consider adding an escape hatch for local development.

Based on learnings: This implementation changes the UX significantly for local-first usage. If this deviates from the original vision, user approval may be needed.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@DESIGN_SPEC.md` at line 2762, The design enforces mandatory JWT/API-key auth
which harms local-first UX; add a guarded bypass so local single-user runs can
skip auth: introduce a config flag (e.g., AUTH_REQUIRED default true) and an
explicit dev override (e.g., CLI --no-auth or env AI_COMPANY_DEV_MODE=true) and
amend the central auth path (the auth middleware / function used for all
requests, e.g., authMiddleware or validateAuth) to short-circuit authentication
when AUTH_REQUIRED is false OR when server bind address is localhost/127.0.0.1
(detect via the server.bindAddress or incoming Host header) and log a strong
security warning; keep auth enforced for any non-local bind or when the override
is not set.

| **Config Format** | YAML + Pydantic validation | Human-readable config with strict validation |
| **CLI** | TBD (future, if needed) | Thin wrapper around the REST API for terminal use. May not be needed — interactive Scalar docs at `/docs/api` and `curl`/`httpie` may suffice |

Expand Down Expand Up @@ -2980,7 +2982,7 @@ ai-company/
│ ├── persistence/ # Operational data persistence (§7.6)
│ │ ├── __init__.py # Package exports
│ │ ├── protocol.py # PersistenceBackend protocol (M5)
│ │ ├── repositories.py # Repository protocols: TaskRepository, CostRecordRepository, MessageRepository, ParkedContextRepository, AuditRepository
│ │ ├── repositories.py # Repository protocols: TaskRepository, CostRecordRepository, MessageRepository, ParkedContextRepository, AuditRepository, UserRepository, ApiKeyRepository
│ │ ├── config.py # PersistenceConfig model (M5)
│ │ ├── errors.py # Persistence error hierarchy (M5)
│ │ ├── factory.py # create_backend() factory (M5)
Expand All @@ -2991,7 +2993,8 @@ ai-company/
│ │ ├── hr_repositories.py # SQLite HR repositories (LifecycleEvent, TaskMetricRecord, CollaborationMetricRecord)
│ │ ├── parked_context_repo.py # SQLiteParkedContextRepository (park/resume serialized agent state)
│ │ ├── audit_repository.py # SQLiteAuditRepository (append-only audit entry persistence)
│ │ └── migrations.py # Schema migrations (user_version pragma)
│ │ ├── user_repo.py # SQLiteUserRepository + SQLiteApiKeyRepository
│ │ └── migrations.py # Schema migrations (user_version pragma, v1–v5)
│ ├── observability/ # Structured logging & correlation
│ │ ├── __init__.py # get_logger() entry point
│ │ ├── _logger.py # Logger configuration
Expand Down Expand Up @@ -3183,18 +3186,25 @@ ai-company/
│ ├── api/ # REST + WebSocket API (M6)
│ │ ├── app.py # Litestar application factory, lifecycle hooks
│ │ ├── approval_store.py # In-memory approval queue storage
│ │ ├── auth/ # JWT + API key authentication subsystem
│ │ │ ├── config.py # AuthConfig (frozen Pydantic, HMAC algorithm, exclude paths)
│ │ │ ├── controller.py # AuthController (setup, login, change-password, me)
│ │ │ ├── middleware.py # ApiAuthMiddleware (JWT-first, API key fallback)
│ │ │ ├── models.py # User, ApiKey, AuthenticatedUser, AuthMethod
│ │ │ ├── secret.py # JWT secret resolution (env var → persistence → auto-generate)
│ │ │ └── service.py # AuthService (Argon2id password hashing, JWT ops, API key hashing)
│ │ ├── bus_bridge.py # Message-bus → WebSocket bridge
│ │ ├── channels.py # WebSocket channel definitions
│ │ ├── config.py # API configuration models (ServerConfig, CorsConfig)
│ │ ├── controllers/ # 14 class-based controllers + 1 WebSocket handler (15 route modules)
│ │ ├── controllers/ # 15 class-based controllers + 1 WebSocket handler (16 route modules)
│ │ ├── dto.py # Request/response DTOs and envelopes
│ │ ├── errors.py # API error hierarchy (ApiError, NotFoundError, etc.)
│ │ ├── errors.py # API error hierarchy (ApiError, NotFoundError, UnauthorizedError, etc.)
│ │ ├── exception_handlers.py # Litestar exception handler registration
│ │ ├── guards.py # Route guards — read/write access (stub auth, M7 real auth)
│ │ ├── middleware.py # Request logging middleware
│ │ ├── guards.py # Route guards — role-based read/write access control (HumanRole enum)
│ │ ├── middleware.py # Request logging, CSP middleware
│ │ ├── pagination.py # Cursor-free offset/limit pagination
│ │ ├── server.py # Uvicorn server runner
│ │ ├── state.py # Typed AppState container with service access
│ │ ├── state.py # Typed AppState container with service access (deferred auth init)
│ │ └── ws_models.py # WebSocket event models (WsEvent, WsEventType)
│ ├── cli/ # CLI interface (future, if needed)
│ │ ├── __init__.py
Expand Down
9 changes: 5 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,10 +24,11 @@ AI Company lets you spin up a virtual organization staffed entirely by AI agents
- **Memory Interface (M5)** - Pluggable `MemoryBackend` protocol with capability discovery, shared knowledge protocol, domain models, config, factory, and context injection retrieval pipeline (ranking, token-budget formatting, non-inferable filtering). Shared organizational memory via `OrgMemoryBackend` protocol with hybrid prompt+retrieval backend. Memory consolidation/archival with pluggable strategies and retention enforcement
- **Coordination Error Taxonomy (M5)** - Post-execution classification pipeline detecting logical contradictions, numerical drift, context omissions, and coordination failures
- **Budget Enforcement (M5)** - `BudgetEnforcer` service with pre-flight checks, in-flight budget checking, auto-downgrade, configurable cost tiers, and quota/subscription tracking; `CostOptimizer` CFO service with anomaly detection, efficiency analysis, downgrade recommendations, and approval decisions; `ReportGenerator` for multi-dimensional spending reports
- **Litestar REST API (M6)** - 13 controllers + WebSocket handler covering company, agents, tasks, budget, approvals, analytics, messages, meetings, projects, departments, artifacts, providers, health, and WebSocket real-time feed
- **Litestar REST API (M6)** - 15 controllers + WebSocket handler covering company, agents, tasks, budget, approvals, analytics, messages, meetings, projects, departments, artifacts, providers, health, auth, and WebSocket real-time feed
- **Human Approval Queue (M6)** - Approval submission, approve/reject with reason, list/filter by status, WebSocket notifications for approval events
- **WebSocket Real-Time Feed (M6)** - Channel-based subscriptions (tasks, agents, budget, messages, system, approvals), per-channel payload filters, message-bus bridge
- **Route Guards (M6)** - Role-based read/write access control (stub auth for M6; real JWT/OAuth planned for M7)
- **Route Guards (M6)** - Role-based read/write access control with 5 human roles (CEO, Manager, Board Member, Pair Programmer, Observer)
- **JWT + API Key Authentication (M7)** - Mandatory auth middleware (JWT-first with API key fallback), Argon2id password hashing, first-run admin setup, password change flow, SHA-256 API key hashing, regex-based path exclusions
- **HR Engine (M7)** - Hiring pipeline (request → generate candidate → approval → instantiate), onboarding checklists, offboarding pipeline (reassign → archive → notify → terminate), agent registry
- **Performance Tracking (M7)** - Task metrics, CI-based quality scoring, behavioral collaboration scoring, Theil-Sen robust trend detection, multi-window rolling metric aggregation
- **Progressive Trust (M7)** - 4 strategies (disabled/weighted/per-category/milestone) behind pluggable `TrustStrategy` protocol, trust level tracking, action permission evaluation
Expand All @@ -38,12 +39,12 @@ AI Company lets you spin up a virtual organization staffed entirely by AI agents

- **Memory Backend Adapter (M5)** - Memory protocols, retrieval pipeline, org memory, and consolidation are complete; initial Mem0 adapter backend ([ADR-001](docs/decisions/ADR-001-memory-layer.md)) pending; research backends (GraphRAG, Temporal KG) planned
- **CLI Surface** - `cli/` package is placeholder-only
- **Security/Approval System (M7)** - Real authentication (JWT/OAuth) and approval workflow gates are planned
- **Security/Approval System (M7)** - SecOps agent with rule engine (soft-allow/hard-deny, fail-closed), audit log, output scanner, risk classifier, and ToolInvoker integration are implemented; progressive trust (4 strategies), promotion/demotion, autonomy levels (5 tiers with presets, resolver, change strategies) and approval timeout policies (wait-forever, auto-deny, tiered, escalation-chain with task park/resume) are implemented; JWT + API key authentication is implemented; approval workflow gates remain planned
- **Advanced Product Surface** - web dashboard, external integrations

## Status

**M7: Security & Approval** partially complete — Docker sandbox, MCP bridge, code runner, SecOps agent, HR engine + performance tracking, progressive trust, promotion/demotion done; authentication/approval workflow gates remain. See [DESIGN_SPEC.md](DESIGN_SPEC.md) for the full high-level specification.
**M7: Security & Approval** partially complete — Docker sandbox, MCP bridge, code runner, SecOps agent, HR engine + performance tracking, progressive trust, promotion/demotion, JWT + API key authentication done; approval workflow gates remain. See [DESIGN_SPEC.md](DESIGN_SPEC.md) for the full high-level specification.

## Tech Stack

Expand Down
8 changes: 8 additions & 0 deletions docker/.env.example
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,14 @@
# API key for the LLM provider (required for agent execution)
LLM_API_KEY=

# --- Authentication ----------------------------------------------------------
# JWT signing secret (optional — auto-generated and persisted on first run).
# Set explicitly only for multi-instance deployments sharing a common secret.
# Must be >= 32 characters if set.
# Generate with: python -c "import secrets; print(secrets.token_urlsafe(48))"
# AI_COMPANY_JWT_SECRET=
# First-run: POST /api/v1/auth/setup to create admin account

# --- Application -------------------------------------------------------------
# Log level: debug, info, warning, error, critical
AI_COMPANY_LOG_LEVEL=info
Expand Down
2 changes: 2 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -15,12 +15,14 @@ classifiers = [
dependencies = [
"aiodocker==0.26.0",
"aiosqlite==0.22.1",
"argon2-cffi==25.1.0",
"jinja2==3.1.6",
"jsonschema==4.26.0",
"litellm==1.82.1",
"litestar[standard,structlog,pydantic,brotli,prometheus]==2.21.1",
"mcp==1.26.0",
"pydantic==2.12.5",
"pyjwt[crypto]==2.11.0",
"pyyaml==6.0.3",
"structlog==25.5.0",
]
Expand Down
Loading
Loading