refactor(web): extract assistant lifecycle state machine into a non-React service#32654
Conversation
Closes LUM-2062 — the architectural finish line that #32641 set up. The lifecycle state machine (hatch + initializing-recovery retry budgets, generation counter for stale-response handling, 5-minute stuck-init watchdog, gateway-auth short-circuit, post-hatch cache seed, onboarding-redirect coordination) now lives in `apps/web/src/assistant/lifecycle-service.ts` as a plain TypeScript class — module-level singleton, instantiated at import, React-independent. What the extraction unlocks: - **No action-registry pattern.** Actions are methods on the service, stable from instant zero. No `registerImperativeActions` effect, no `NOOP_CHECK` / `NOOP_VOID` defaults, no "use selectors not getState" docstring nuance, no "use getState not selectors" reversal of same. Consumers import `lifecycleService` and call methods directly. - **No private refs alongside store state.** All state (counters, generation, hatching flag, in-flight ids) lives as instance fields on the service. Single state system, no React/store duality. - **The watchdog moves out of React.** `transition()` arms / clears the 5-minute stuck-init timer based on the next state's kind — no more `initializingCycle` re-arming via useEffect. - **Direct unit tests.** `lifecycle-service.test.ts` exercises the server-state projection, the auto-hatch cascade with the nonprod / retired branches, the logout reset, and the gateway-auth short-circuit — all without `renderHook`, all asserting on `lifecycleService` + the two Zustand stores directly. The hook (`use-lifecycle.ts`) shrinks from 676 lines to 99 — just the React-bound wiring: `useQueryClient`, `useAssistantQuery`, two `useEffect`s that push inputs and poll results into the service. The store (`lifecycle-store.ts`) shrinks from 110 lines to 39 — just the `assistantState` field. No actions, no NOOP defaults. Consumers (`chat-page`, `hatching-screen`, `pre-chat-flow`, `use-event-bus-init`) switch from `useAssistantLifecycleStore.getState().checkAssistant()` to `lifecycleService.checkAssistant()`. The chat-page wrappers that existed only to defeat the action-registry identity flip (`useCallback(() => store.getState().X(), [])`) get to stay tidy because the service methods are stable from import. Tests: 181/181 pass, including 7 new direct unit tests for the service. `tsc --noEmit`, `lint`, and `audit:cross-domain` all clean. No new cross-domain imports. https://claude.ai/code/session_01NjQuUAiXsS4DVEmKPsb1uE
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 527f268a45
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| if (generation !== this.generation) return; | ||
| if (nextState.kind !== "active") { | ||
| this.transition(nextState); |
There was a problem hiding this comment.
Preserve the stuck-initializing watchdog across polls
When the assistant remains initializing, every 3-second poll reaches this path and calls transition({ kind: "initializing" }) again. transition() now clears and re-arms the initializing watchdog on every call, so continuous initializing poll results keep resetting the 5-minute timer and recoverStuckInitializingAssistant() never runs for a genuinely stuck hatch.
Useful? React with 👍 / 👎.
Two changes from the post-open audit:
**Codex P1 — watchdog reset by every poll.** My initial
`syncInitializingWatchdog` cleared + re-armed the 5-minute timer
on every `transition()` call. While the assistant stayed
`initializing`, every 3-second poll re-entered the projection
path and called `transition({ kind: "initializing" })` again,
which rearmed the timer — so the watchdog never expired and
`recoverStuckInitializingAssistant()` would never fire for a
genuinely stuck hatch.
The original React implementation avoided this by keying the
watchdog `useEffect` on `[assistantStateKind, initializingCycle]`;
back-to-back same-kind transitions didn't re-run the effect,
keeping the timer counting down. `initializingCycle` was bumped
only inside `hatchAndCheck`, which is how a fresh hatch attempt
restarted the clock.
Rewriting `transition()` to edge-trigger (arm only on entry into
`initializing`, clear only on exit) and adding an explicit
`armInitializingWatchdog()` call inside `hatchAndCheck` to match
the cycle-bump semantics. Net effect: same timer behavior as the
pre-extraction code.
**Audit DRY** — `applyServerStateUpdate` and
`recoverStuckInitializingAssistant` duplicated the active and
self_hosted projection logic. Extracted `projectActive(result)`
and `projectSelfHosted(result)` private helpers. Preserved the
intentional pre-existing asymmetry where the recovery path
doesn't clear `hatchingVersion` on active landing (each call
site still manages its own counter / flag bookkeeping).
**New tests** for the bug Codex caught and the ticket-mandated
retry-budget exhaustion:
- `redundant initializing→initializing transitions do not reset the 5-minute clock`
- `3 recoverable hatch failures in the auto-hatch poll loop surface as error`
181/181 test files pass.
https://claude.ai/code/session_01NjQuUAiXsS4DVEmKPsb1uE
`applyServerResult` takes `GetAssistantResult`, whose `data`
satisfies the full `Assistant` type. My watchdog test passed an
object literal with just `{id, status}` — CI flagged the missing
13 required fields.
Switched the test to call `lifecycleService.checkAssistant()`
twice more instead of constructing the result by hand. The
fake queryClient routes back through `getAssistantMock` (which
has loosely-typed return values per its mock definition), so
the test exercises the same projection path without
constructing the full type.
Also: added a note in `beforeEach` flagging that `mockClear`
doesn't reset implementations — future mocked deps added here
must re-set their baseline in this block or tests will silently
inherit the previous test's stub.
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: aeaf269525
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
|
|
||
| const navigateToChatAfterLifecycleRefresh = useCallback(async () => { | ||
| await useAssistantLifecycleStore.getState().checkAssistant(); | ||
| await lifecycleService.checkAssistant(); |
There was a problem hiding this comment.
Guard lifecycle calls before root inputs are installed
When this page mounts directly with readOnboardingCompleted() already true, its child useEffect can run before RootLayout's passive effect calls lifecycleService.setInputs(). This direct singleton call then uses the service's default queryClient: null, so checkAssistant() catches a TypeError and publishes a spurious network-error lifecycle state before navigating; the previous store action had a no-op pre-registration default specifically for this window. Keep a no-op/ready guard or initialize the service before child effects can invoke it.
Useful? React with 👍 / 👎.
Codex P2: a child route's `useEffect` that calls a lifecycle action (e.g. `lifecycleService.checkAssistant()`) on a direct mount can fire BEFORE `RootLayout`'s passive effect has called `setInputs()` — children's effects commit before parent's in the same render cycle. The unguarded action tried to use the default `null` queryClient, threw a TypeError on `null.fetchQuery(...)`, caught it, and published a spurious network-error state. The previous store-based action API papered this over with NOOP_CHECK / NOOP_VOID defaults. The service had no equivalent. Adding a `ready` flag that flips true on the first `setInputs()` call. Public actions (`checkAssistant`, `retryAssistant`, `hatchVersion`, `respondToInputs`, `applyServerResult`) early-return when `!ready`. Matches the no-op-default behavior the old store API used to ship with, but now expressed explicitly on the service. New test asserts that calling actions before `setInputs` is a no-op and leaves the lifecycle store in its initial `loading` state. 181/181 test files pass.
The docstring said "Mirrors the no-op defaults the pre-service store action API used to ship with" — references a shape that no longer exists in the codebase. Rephrased to state the invariant the guard protects (children's effects committing before the parent's) without naming a previous implementation.
|
@codex review |
|
Codex Review: Didn't find any major issues. Hooray! ℹ️ About Codex in GitHubCodex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback". |
There was a problem hiding this comment.
✦ APPROVE — reviewed at 25a2dcbd4e
Value: Closes the architectural finish line #32641's PR description pointed at. The action-registry pattern, NOOP defaults, render-time capture race, and selector-flip race — all four Codex P1s from #32641 dissolve because the state machine no longer lives inside React's render-and-effect timing. Plus: the machine is now directly testable without renderHook.
What this does:
- New
lifecycle-service.ts(642 lines) — module-level singleton class owning state machine, watchdog, retry budgets, generation counter, gateway-auth short-circuit, cache seeding, onboarding-redirect coordination. use-lifecycle.tsshrinks 676 → 99 lines — pure wiring: pull React signals + push viasetInputs, hand poll results viaapplyServerResult. No state, no callbacks.lifecycle-store.tsshrinks 110 → 39 lines —assistantStateonly. The store is now strictly an observability surface; only the service writes.- Consumer call shape simplifies:
useAssistantLifecycleStore.getState().checkAssistant()→lifecycleService.checkAssistant()acrosschat-page,hatching-screen,pre-chat-flow,use-event-bus-init. - New direct unit tests in
lifecycle-service.test.tsexercise server-state projection (active + self-hosted), the auto-hatch cascade (nonprod / retired / vanilla-managed branches), the logout reset, and the gateway-auth short-circuit. NorenderHook, no React tree.
Full analysis
Codex P1 (watchdog reset on every poll) — closed at 4a9aade767
The original transition() cleared+re-armed the watchdog on every call, so back-to-back initializing → initializing poll results reset the 5-minute clock indefinitely and recoverStuckInitializingAssistant() could never fire on a genuinely stuck hatch. Fix is edge-triggered:
if (next.kind !== "initializing") {
if (this.initializingTimeout) clearTimeout(...);
} else if (prevKind !== "initializing") {
this.armInitializingWatchdog();
}A repeat initializing → initializing is a no-op kind-wise — no clear, no re-arm. hatchAndCheck() calls armInitializingWatchdog() explicitly so a fresh hatch attempt still resets the clock. The inline comment spells out the invariant — high-leverage for the next maintainer.
Codex P2 (pre-mount lifecycle call → null queryClient) — closed at 26521f2162
ready: boolean ships false, setInputs() is the only flip-to-true. Every public action (checkAssistant, respondToInputs, applyServerResult) early-returns if (!this.ready). So a child route whose useEffect commits before RootLayout's setInputs lands — the parent/child effect commit-order race — becomes a silent no-op instead of TypeError-on-null.fetchQuery followed by a spurious network-error state being published.
This is the structural equivalent of the NOOP_CHECK defaults the previous version shipped — same invariant, but the guard lives once on the class instead of being scattered across three action fields with NOOP fallbacks the store had to expose. The docstring on ready explicitly cites the race; future maintainers see why it's there.
Test architecture
Bun's mock.module at module scope + lifecycleService.__resetForTesting() in beforeEach/afterEach is the right pattern for a module-level singleton. The comment in beforeEach about why mockClear isn't enough (and why every new mock needs a fresh baseline) is doing real work — the kind of subtle leak that costs 30 minutes of debugging a flaky test.
The fact that auto-hatch cascade, retry-budget exhaustion, and the gateway-auth short-circuit are now reachable from a unit test — no renderHook, no fake timers, no time-travel reasoning about effect commit order — is the headline benefit of the extraction. The state machine being a class with __resetForTesting is the kind of testability boring frontend code rarely gets.
Observations (non-blocking)
-
useEffectdep array inuse-lifecycle.ts:75–86listsonRedirectandresolveOnboardingRedirect. If the parent constructs either inline per render, the effect re-runs every render →setInputsre-publishes identical values →respondToInputsre-fires. Most calls are early-return no-ops inside the service, so it's not a correctness bug — but worth confirmingRootLayoutwraps both inuseCallback. The kind of thing the next render-perf pass will catch if it's drifted. -
getLocalGatewayUrlmock atlifecycle-service.test.ts:41is set up but I don't see it asserted in any test. Either dead mock or pending coverage for the local-gateway projection path. Drop it or add the assertion. -
useCallbackwrappers inchat-page.tsx:174–187— still necessary becauselifecycleService.checkAssistantis a normal class method (passing the bare reference losesthis). Two cleaner shapes if you ever want to revisit: (a) define the actions as arrow-class-fields sothisbinds at construction; (b) ship pre-bound exports next to the singleton. Current form is fine — the comment above each wrapper could shrink now that the identity-flip rationale is gone, but not worth a follow-up commit.
Merge gate
- CI 7/7 green at HEAD
- Codex review at HEAD: "Didn't find any major issues. Hooray!"
- Devin review: 0 potential issues
- Both Codex inline findings (P1 watchdog, P2 input guard) verified closed in code
Mergeable.
Vellum Constitution — Trust-seeking: by lifting the state machine out of React's commit-order choreography, the four classes of failure modes Codex caught in the predecessor PR become structurally impossible to express, and the machine is now reachable from unit tests with no time-travel reasoning about render order.
The comment referenced "actions registered in a passive effect" and the render-time-capture-vs-selector-subscription debate — all of which became moot when the state machine moved out of React. Updated to state the current invariants: stable identity across renders for prop boundaries, and `this` binding preserved on call (the underlying reason class methods need a wrapper).
Closes LUM-2067 — the test-coverage gap I'd been planning as a
follow-up. Two new direct unit tests exercise the recovery code
that was the last meaningful uncovered area in the service:
1. **Watchdog firing on a still-initializing assistant retires
and re-hatches.** Verifies the timer fires, recovery captures
the stuck assistant id, calls `retireAssistantById`, and
follows up with `hatchAssistant`. The fix from `4a9aade767`
that edge-triggers the watchdog is what makes this reachable —
without it, redundant `initializing→initializing` poll
transitions would reset the timer indefinitely.
2. **MAX_INITIALIZING_RECOVERIES failed recoveries surface as a
terminal timeout error.** Drives four watchdog cycles where
each recovery's rehatch returns initializing (the assistant
stays stuck). After the budget exhausts, the next watchdog
firing transitions to the `buildInitializingTimeoutError`
shape. Confirms `MAX_INITIALIZING_RECOVERIES` actually bounds
the loop (the kind of off-by-one that would manifest as users
permanently stuck on the initializing screen).
Pattern: `mock.module("@/assistant/lifecycle", ...)` overrides
`INITIALIZING_TIMEOUT_MS` to 30ms while preserving the real
helpers (`resolveAssistantLifecycleState`,
`buildInitializingTimeoutError`, etc.). The watchdog runs in
real time against the shrunk timeout, no fake-timer machinery,
no time-travel reasoning.
Left for the integration tests: the "force-refresh found active"
and post-hatch active-landing branches both depend on the
React-tree polling loop firing `applyServerResult` from the
next `useAssistantQuery` data tick. The state machine alone
can't reach those outcomes without a poll driver, so
`onboarding-lifecycle-sync.test.tsx` carries that coverage.
181/181 tests pass.
There was a problem hiding this comment.
✦ Re-APPROVE — reviewed at c9a8b15da0
Two commits since the prior APPROVE at 25a2dcbd4e, both targeted at observations from that review.
Test coverage extension (c9a8b15da0, closing LUM-2067):
12 direct unit tests now, +2 from the prior pass — both exercising the recovery code that was the last meaningful uncovered area in the service:
-
Watchdog → retire → re-hatch end-to-end. The watchdog is reachable from a test only because
transition()'s edge-trigger fix (4a9aade767) stoppedinitializing → initializingpoll repeats from indefinitely resetting the clock. The test captures that explicitly: agetAssistant → initializingprojection arms the watchdog, the watchdog fires afterTEST_INITIALIZING_TIMEOUT_MS(30ms),retireAssistantByIdis called with the capturedinitializingAssistantId, thenhatchAssistantfollows up. The structural fix made the test possible — which is exactly the right shape. -
MAX_INITIALIZING_RECOVERIESbudget. Drives four watchdog cycles where each recovery's re-hatch returns initializing (the assistant stays stuck). After the budget exhausts, the next firing transitions tobuildInitializingTimeoutError. The kind of off-by-one that would surface as users permanently stuck on the initializing screen — now a regression test catches it.
The mocking pattern is the right call. mock.module("@/assistant/lifecycle", ...) overrides only INITIALIZING_TIMEOUT_MS, preserving the real resolveAssistantLifecycleState / buildInitializingTimeoutError / shouldRecoverFromHatchFailure helpers. The watchdog runs in real time against the shrunk timeout — no fake-timer machinery, no time-travel reasoning about effect commit order. Same testability headline as the original extraction, extended.
Integration boundary articulated. The commit message names the two branches deliberately left to integration coverage — "force-refresh found active" (recovery branch when initializingAssistantId was null at watchdog time) and post-hatch active landing — because both depend on the React-tree polling loop firing applyServerResult from the next useAssistantQuery data tick. The state machine alone can't reach those outcomes without a poll driver, so onboarding-lifecycle-sync.test.tsx carries them. That's a principled unit-vs-integration boundary, not a coverage gap to apologize for.
Comment cleanup (ad191fd7b9):
Picks up Observation #3 from the prior review. The old comment carried forward the action-registry / NOOP-defaults / selector-flip rationale that became moot when the state machine moved out of React. New comment states the current invariants: stable identity for prop boundaries, this binding preserved on call. Crisp.
Observation #2 (unused getLocalGatewayUrl mock at lifecycle-service.test.ts:63) — still present, still unused. Lowest-stakes of the three; non-blocking. Drop on the next pass or annotate as a future-coverage placeholder.
Observation #1 (wiring-effect dep stability) — already verified moot in the prior review: RootLayout passes useNavigate (stable hook output) and a module-imported resolveOnboardingRedirect (program-lifetime constant). The wiring effect doesn't re-fire on renders.
Merge gate
- CI 7/7 green at HEAD
- Codex review at HEAD: "Didn't find any major issues" (the lingering P1/P2 inline anchors are auto-migrated from older commits — both verified structurally closed in the prior review)
- Devin review: 0 potential issues
- 181/181 tests pass per commit message
- All three prior-review observations either addressed (#3) or verified non-issues (#1, #2)
Mergeable. Test additions tilt this from "ship and watch" to "ship with confidence" — the recovery code that runs on a genuinely-stuck cold mount now has end-to-end regression coverage.
Closes LUM-2062. Follow-up to #32641.
Why
#32641 published assistant lifecycle through Zustand and killed the outlet-context plumbing, but left the state machine itself React-shaped. The hook held all the domain state in
useRefs anduseState, defined the actions insideuseCallbackclosures over React-bound primitives (useQueryClient,useMutation), and registered them into the store via a passiveuseEffect. That shape produced every Codex P1 we caught during #32641 cleanup:create()because they close over React state, so they have to be registered viauseEffectafter mount).NOOP_CHECK/NOOP_VOIDdefaults that cover the brief registration race..getState()captures the no-op because the parent's register-actions effect hasn't committed yet..use.X()re-renders on registration, which cleans up and re-runs any effect listing the action as a dep, which can re-execute side effects (duplicate hatch POST in the cloud path).The PR description on #32641 explicitly called out the action-registry pattern as the trade we made and pointed at LUM-2062 as the architectural finish line. This is that.
What changed
The state machine moves out of React entirely.
useAssistantLifecycle(676 lines):useStateforassistantState, 6useRefs for counters/generation/in-flight flags, actionuseCallbacks, register-actionsuseEffect, watchdoguseEffectlifecycle-service.ts(620 lines): plain TypeScript class, module-level singleton, instantiated at importuseQueryClient,useAssistantQuery, twouseEffects that push inputs and poll results into the serviceregisterImperativeActionsactionassistantStateonlyuseAssistantLifecycleStore.getState().checkAssistant()(sometimes wrapped inuseCallbackto defeat identity flips)lifecycleService.checkAssistant()— service methods are stable from instant zeroThe state machine semantics don't change. Same phases, same retry budgets, same recovery paths, same gateway-auth / local-mode branches, same post-hatch cache seed, same onboarding-redirect coordination. The extraction is purely structural.
Benefits
lifecycle-service.test.tsexercises the server-state projection (active and self-hosted result handling), the auto-hatch cascade (the nonprod / retired / vanilla-managed branches), the logout reset, and the gateway-auth short-circuit — norenderHook, no React tree. Recovery paths and retry budgets are reachable from tests for the first time..use.X()vs.getState()decision for lifecycle actions. Either pattern would have worked equally well before — neither does anything special now, because the action isn't on the store. Just import the service and call methods.useRefparallel to the store. Service fields are the only place state lives; the store mirrorsassistantStatefor React reactivity.Why safe
setSelfHostedConnection(...), everysetQueryDatacache seed, everyinvalidateQueriesbackoff fires under the same predicate.queries.test.ts,onboarding-lifecycle-sync.test.tsx,use-event-bus-init.test.tsx— the existing behavioral coverage of the hook. The only test changes are mock targets (@/assistant/lifecycle-store→@/assistant/lifecycle-servicein two test files; both tests still assert the same behavior).tsc --noEmit,lint,audit:cross-domain,test:ciall clean. 181/181 test files pass (added one new unit-test file).src/assistant/next to the existing files; consumers are unchanged in scope.References
.getState()and.setState()work on module-level Zustand stores without React. Used by the service to writeassistantState.AssistantStatefrom server polling results but layers in client retry/recovery logic that doesn't reduce to a query).apps/web/docs/STATE_MANAGEMENT.md— the canonical-migration pointer extended in refactor(web): publish assistant lifecycle through Zustand, drop outlet-context plumbing #32641. This PR is the natural next step but doesn't require new doc rules; the lifecycle-store docstring nuance ("subscribe vs.getState()for the registered actions") is now deleted because the trap no longer exists.What I considered and did NOT do, and why
Did not switch
useAssistantLifecycleStoreto azustand/vanillastore oruseSyncExternalStore. Either would work — the service would own the vanilla store as its private state container, anduseAssistantLifecycleStorewould be a thin React wrapper around it. Kept thecreate<...>+createSelectorsshape because (a) it's consistent with every other store in the codebase, (b)useAssistantLifecycleStore.setState({...})from the service is the same operation the previous code did via the store's own action, just inline, and (c) the migration was big enough already.Did not own the polling timer inside the service. TanStack Query's
refetchInterval(inuseAssistantQuery) still drives the 3-second poll cadence; the service just receives results via the hook'suseEffect. Doing in-service polling would lose TanStack Query's dedup with other/assistant/subscribers (sidebar header, identity panel) and the integration with thesetQueryDatapost-hatch seed. Out of scope for this ticket.Did not move the logout-clear into a synchronous call at the logout site (the vex-bot non-blocking observation on #32641). Same code as before — service resets when
respondToInputs()sees!isLoggedIn. Moving it earlier in the call chain (e.g. into theauth-storelogout action callinglifecycleService.respondToInputs()directly) is a one-line follow-up if you ever see flake; not in scope here.Did not inject the API functions (
getAssistant,hatchAssistant,retireAssistantById) throughsetInputs. The service imports them at module level. Tests mock viamock.module(...)— same pattern as everything else in the codebase. Constructor injection would have been the "more textbook" non-React shape but adds plumbing without solving a real problem for one consumer.Did not extract a
LifecycleStateMachineinterface separate from the service. Could have separated "pure transitions" (a state-table-like function) from "side effects" (the cache seeding, the self-hosted connection priming). Considered briefly; the recovery and post-hatch paths are so interleaved with side effects that separating them just spreads the logic across two files. Held off.Did not delete
useAssistantLifecycleStorein favor of subscribing directly to a service emitter. The store is still the React tree's reactive view; the service writes to it. Two-layer separation (service for logic, store for reactivity) is the same tradeuseEventBusStoremakes for the event bus. Consistent across the codebase.Root cause analysis
How did the code get into this state?
#32641 extracted lifecycle into Zustand stores, which fixed the outlet-context plumbing problem. But the hook kept owning the state machine through React primitives (
useRef,useCallback,useEffect) because that was the smallest diff that achieved the outlet-context fix. The trade was explicit in the #32641 PR description, with this ticket as the planned follow-up.The deeper cause is structural: the lifecycle state machine is domain logic with no React-specific dependencies. React was the wrong shape for it from the start, but it was also the lowest-friction shape when the hook was small. As the machine grew (retry budgets, recovery paths, generation counter, watchdog), the React-bound shape became increasingly costly.
What mistakes or decisions led to it?
useState+useEffect— small, simple, idiomatic React. Each new responsibility (retry budgets, recovery, watchdog) was added inside the hook because the hook already existed. No one stopped to ask "should this live outside React?"Were there warning signs we missed?
useRefs just to hold private state. Refs are a code smell for state that's outgrowing the hook shape.hatchMutateRef,retireMutateRef— capturinguseMutation's.mutateAsyncin refs because the mutation object itself was unstable. That's React fighting React.isRetiredRef/isNonProductionRef/onRedirectRef/resolveOnboardingRedirectRefpattern — capturing every external value in a ref to keepuseCallbackidentities stable. Each ref was a workaround for the fact that the actions wanted to be stable but the hook couldn't make them so.What can we do to prevent this pattern from recurring?
useRef-as-private-state count as a smell. A handful is normal. Six refs is a hook trying to be a class.AGENTS.md changes
No new rule. The outlet-context rule from #32641 stays in place. The lifecycle-store docstring nuance about "selector subscription vs
.getState()for the registered actions" is deleted, not preserved — the trap no longer exists, so leaving the warning would mislead.The pattern this PR demonstrates (hook-as-React-wiring + module-level service-as-state-machine + Zustand-as-reactive-output) is worth a brief mention in
STATE_MANAGEMENT.mdif a second example appears. For now the in-file docstrings (lifecycle-service.tsheader) carry the explanation.https://claude.ai/code/session_01NjQuUAiXsS4DVEmKPsb1uE
Generated by Claude Code