diff --git a/docs/BACKLOG.md b/docs/BACKLOG.md index 0a946912..d3de149e 100644 --- a/docs/BACKLOG.md +++ b/docs/BACKLOG.md @@ -7037,6 +7037,51 @@ systems. This track claims the space. ## P3 — noted, deferred +- [ ] **Multi-account access design — safety-first research + design proposal for eventually letting Otto operate across multiple accounts (ServiceTitan / personal / other) without confusion or privilege-bleed. Design allowed now; implementation gated on Aaron's personal security review of the design before any code lands.** Aaron 2026-04-23 Otto-76: *"FYI don't get confused i switchd the codex CLI to service titan like you so you would be on the same account, if you open the playwrite it's logged into my personal account with amara access. i happy to expand multi account access design in the future we don't need to worry about it right now, this is how we are setup for now, free free to resaerch, design multi account access and how to make it safe as part of this proiject low backlog item"* — then Otto-76 refinement: *"its fine to design and all that now on multi account thats one i just would want to review a design first, i want to validate that one for securty consers myself"* + *"you can pick the timing"*. + + **Two-phase structure — design-now-implementation-later:** + 1. **Phase 1 — design proposal (authorised, timing Otto's call).** Research + design document covering the 7 questions below. Lands as a research doc / ADR / decision-proxy evidence record. **No implementation code yet.** Aaron reviews the design personally for security concerns. + 2. **Phase 2 — implementation (gated on Aaron's design review + approval).** Only starts after Aaron has explicitly signed off on the Phase 1 design. Design review can land in multiple forms: ADR with Aaron's sign-off row, decision-proxy evidence record with `requested_by: Aaron` + `authority_level: delegated`, or a direct PR-review approval with explicit "design approved, proceed to implementation" language. **Not** an assumption of approval from silence. + + **Current setup (2026-04-23 snapshot):** + - **Claude Code session (Otto)** — ServiceTitan account (Aaron's work-tier seat; factory-agent workload). + - **Codex CLI session** — ServiceTitan account (switched Otto-76 to align with Claude Code session; parity for cross-harness work). + - **Playwright MCP** — Aaron's personal account (has Amara access at `chatgpt.com`). + - **GitHub authentication** — Aaron's personal `aarons` GitHub identity (owns both AceHack and LFG via org membership; LFG is the canonical substrate; AceHack is the experimentation frontier). + + **Why this is priority-low-but-design-authorised:** + - Today, same-account alignment (ServiceTitan-across-Claude-Code-and-Codex) sidesteps most multi-account complexity. + - Playwright's personal-account scope is bounded (browser automation for courier ferries + ChatGPT interaction for Amara), not mixed with factory-agent credentials. + - Aaron explicitly sized the initial ask as "low backlog item" and said "we don't need to worry about it right now". + - Otto-76 refinement: *"its fine to design and all that now on multi account"* — design work is OK immediately if Otto chooses to prioritise it; implementation requires Aaron's personal security review first. Priority stays P3 because Aaron scoped timing to Otto's choice, but design-gate is lifted. + + **What research + design needs to cover:** + 1. **Authentication model.** How does Otto know which account it's acting on? What's the handshake when it starts a tool call that uses account-bound credentials? Today it's "whatever the shell inherits"; that's fine for single-account but insufficient for multi. + 2. **Privilege-bleed prevention.** If Otto has access to ServiceTitan + personal + `LFG.admin` + `Aaron.personal.admin` simultaneously, how do we prevent one scope from acting as another? Per-tool credential isolation? Explicit `--as ` flags? Principle-of-least-privilege for the factory-agent. + 3. **Audit trail.** Every tool call should record which account authorised it. The decision-proxy-evidence record format (`docs/decision-proxy-evidence/`) has `requested_by` / `proxied_by` fields; multi-account extends this with `authorised_by_account`. + 4. **Cross-account boundary rules.** Are there actions Otto should NEVER do on account X (e.g., spending on ServiceTitan account, because Aaron owns cost personally)? The full-GitHub-authorization memory (2026-04-23) already has a spending-increase hard line; multi-account generalises this. + 5. **Safe-default fallthrough.** When an account context is ambiguous, default to the most-restrictive scope, not the most-permissive. Fail-closed on account detection, not fail-open. + 6. **Explicit maintainer visibility.** Aaron should be able to inspect "which account was Otto on when it did X?" through some log / UI. Frontier-burn-rate-UI adjacent (Otto-63); could reuse the same pipeline. + 7. **Credential lifecycle.** Token rotation, refresh, revoke. What happens when Aaron revokes ServiceTitan SSO? Otto should notice + stop acting on that account, not silently continue with stale tokens. + 8. **Poor man's (no-paid-API-key) access modes for personal-tier accounts — hard design requirement.** Aaron Otto-76: *"for some of the personal accounts i can't get api keys without it costing more money so the design need to include personal account that try to use the poor mans version of avoiding api keys, this wont' be true for orgs like service titan but might be for lfg thats my company lol."* Personal accounts (and possibly LFG, Aaron's company) cannot assume paid API-key access is available. The design MUST cover non-API access modes for those accounts: browser-automation (Playwright already exemplifies this for Amara on personal ChatGPT), session-cookie reuse, OAuth device flows, MCP tools that use session state rather than API tokens, shared-credential-with-explicit-scope patterns. Orgs with enterprise subscriptions (ServiceTitan) retain API-key-native paths. **Design tier matrix:** (a) *Enterprise-API-tier* — org accounts with paid API access, use official APIs (fast, structured, rate-limit-generous). (b) *Poor-man-tier* — personal accounts without paid API access, use browser-automation / session-based / OAuth-device flows (slower, scraped, rate-limit-constrained, but works at $0 marginal cost). (c) *Mixed-account-ops* — the interesting case — when a workflow spans an enterprise-tier account AND a poor-man-tier account, how do the two interact without one leaking into the other? Phase 1 design must name which tier each current-setup account is in and what the poor-man-tier mechanism looks like per account. + + **Sibling / composing rows:** + - `docs/decision-proxy-evidence/` schema (PR #222) — already records `requested_by` / `proxied_by`; extend with account when this row executes. + - GitHub-authorization spending hard-line (no paid-tier upgrades, no billing changes without Aaron's explicit say-so) — the first multi-account-aware restriction; multi-account design generalises it. + - Frontier-burn-rate-UI backlog row — natural surface for per-account visibility. + - First-class Codex-CLI session experience row (PR #228) — assumes same-account today; multi-account design is an evolution of the session-layer portability story. + + **Priority:** P3 (not-urgent per Aaron's framing; timing is Otto's call per *"you can pick the timing"*). + + **First file to write (Phase 1 design work, authorised now):** + `docs/research/YYYY-MM-DD-multi-account-access-design-safety-first.md` — survey of analogue systems (AWS roles + assumed identities, gcloud multi-account, Vault's scoped tokens, browser profile isolation) + Zeta-specific threat model + safe-default policy proposal. Aaron reviews for security concerns before any implementation follows. + + **Scope limits:** + - Does NOT authorize implementing multi-account access before Aaron's personal security review of the Phase 1 design. Implementation is explicitly gated. + - Does NOT authorize Otto to start requesting additional account credentials "to prepare" for multi-account — if Aaron wants more accounts added, he adds them explicitly. + - Does NOT block other work — this row is design-and-then-review until Aaron signs off. + - Does NOT authorize acting on any account Otto doesn't currently have legitimate credentials for. Design is about what-a-future-system-could-look-like, not about bootstrapping new-account-access unilaterally. + - **Language + concepts age-classification skill.** The human maintainer 2026-04-23 (two-message directive):