Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
94 changes: 57 additions & 37 deletions ARCHITECTURE.md
Comment thread
noanflaherty marked this conversation as resolved.
Comment thread
noanflaherty marked this conversation as resolved.
Original file line number Diff line number Diff line change
Expand Up @@ -434,7 +434,7 @@ graph TB
- `/channels/inbound` (Telegram/SMS/WhatsApp path) before run orchestration.
- Inbound Twilio voice setup (`RelayConnection.handleSetup`) to seed call-time actor context.
- Runtime channel runs pass this as `guardianContext`, and session runtime assembly injects `<guardian_context>` into provider-facing prompts.
- Voice call orchestration mirrors the same prompt contract: `CallController` receives guardian context on setup and refreshes it immediately after successful voice challenge verification, so the first post-verification turn is grounded as `actor_role: guardian`.
- Voice calls mirror the same prompt contract: `CallController` receives guardian context on setup and refreshes it immediately after successful voice challenge verification, so the first post-verification turn is grounded as `actor_role: guardian`.
- Voice-specific behavior (DTMF/speech verification flow, relay state machine) remains voice-local; only actor-role resolution is shared.

### SMS Channel (Twilio)
Expand Down Expand Up @@ -4091,14 +4091,16 @@ The Calls subsystem supports both **outbound** and **inbound** voice calls via T
```mermaid
sequenceDiagram
participant User as User (Chat UI)
participant Session as Session / Tool Executor
participant CallStore as CallStore (SQLite)
participant TwilioProvider as TwilioProvider
participant TwilioAPI as Twilio REST API
participant Gateway as Gateway (public)
participant Routes as twilio-routes.ts (runtime)
participant WS as RelayConnection (WebSocket)
participant Orch as CallController
participant Ctrl as CallController
participant Bridge as voice-session-bridge
participant RunOrch as RunOrchestrator
participant Session as Session / AgentLoop
participant LLM as Anthropic Claude
participant State as CallState (Notifiers)
participant GuardianDispatch as GuardianDispatch
Expand All @@ -4123,35 +4125,40 @@ sequenceDiagram
TwilioAPI->>Gateway: WebSocket /webhooks/twilio/relay
Gateway->>WS: proxy WS to runtime /v1/calls/relay
WS->>WS: setup message (callSid)
WS->>Orch: new CallController()
Orch->>State: registerCallController()
WS->>Ctrl: new CallController()
Ctrl->>State: registerCallController()

loop Conversation turns
TwilioAPI->>WS: prompt (caller utterance)
WS->>WS: extract speaker metadata + map speaker identity
WS->>Orch: handleCallerUtterance(transcript, speakerContext)
Orch->>LLM: messages.stream()
LLM-->>Orch: text tokens (streaming)
Orch->>WS: sendTextToken() (for TTS)
Orch->>CallStore: recordCallEvent()
WS->>Ctrl: handleCallerUtterance(transcript, speakerContext)
Ctrl->>Bridge: startVoiceTurn()
Bridge->>RunOrch: startRun(conversationId, content, {sourceChannel: 'voice', eventSink})
RunOrch->>Session: route to session pipeline
Session->>LLM: agent loop (tools, memory, skills)
LLM-->>Session: text tokens (streaming)
Session-->>Bridge: eventSink.onTextDelta()
Bridge-->>Ctrl: onTextDelta callback
Ctrl->>WS: sendTextToken() (for TTS)
Ctrl->>CallStore: recordCallEvent()
end

alt ASK_GUARDIAN pattern detected
Orch->>CallStore: createPendingQuestion()
Orch->>GuardianDispatch: dispatchGuardianQuestion()
Ctrl->>CallStore: createPendingQuestion()
Ctrl->>GuardianDispatch: dispatchGuardianQuestion()
GuardianDispatch->>Mac: guardian_request_thread_created IPC
GuardianDispatch->>TG/SMS: POST /deliver/{channel}
Note over Mac,TG/SMS: First channel to respond wins
Mac/TG/SMS->>Routes: guardian answer
Routes->>CallDomain: answerCall()
CallDomain->>Orch: handleUserAnswer()
Orch->>LLM: continue with [USER_ANSWERED: ...]
CallDomain->>Ctrl: handleUserAnswer()
Ctrl->>Bridge: startVoiceTurn([USER_ANSWERED: ...])
end

alt END_CALL pattern detected
Orch->>WS: endSession()
Orch->>CallStore: updateCallSession(completed)
Orch->>State: fireCallCompletionNotifier()
Ctrl->>WS: endSession()
Ctrl->>CallStore: updateCallSession(completed)
Ctrl->>State: fireCallCompletionNotifier()
end

TwilioAPI->>Gateway: POST /webhooks/twilio/status
Expand All @@ -4162,7 +4169,7 @@ sequenceDiagram

### Inbound Call Flow

Inbound calls are triggered when someone dials the assistant's Twilio phone number. The gateway resolves which assistant owns the number, the runtime bootstraps a session keyed by CallSid, and the relay connection optionally gates the call behind guardian voice verification before handing off to the LLM orchestrator.
Inbound calls are triggered when someone dials the assistant's Twilio phone number. The gateway resolves which assistant owns the number, the runtime bootstraps a session keyed by CallSid, and the relay connection optionally gates the call behind guardian voice verification before handing off to the CallController.

```mermaid
sequenceDiagram
Expand All @@ -4174,7 +4181,10 @@ sequenceDiagram
participant CallStore as CallStore (SQLite)
participant WS as RelayConnection (WebSocket)
participant GuardianSvc as ChannelGuardianService
participant Orch as CallController
participant Ctrl as CallController
participant Bridge as voice-session-bridge
participant RunOrch as RunOrchestrator
participant Session as Session / AgentLoop
participant LLM as Anthropic Claude

Caller->>TwilioAPI: Dials assistant phone number
Expand Down Expand Up @@ -4213,7 +4223,7 @@ sequenceDiagram
WS->>GuardianSvc: validateAndConsumeChallenge(code)
alt Code matches
GuardianSvc-->>WS: success + guardian binding created
WS->>Orch: startNormalCallFlow(isInbound=true)
WS->>Ctrl: startNormalCallFlow(isInbound=true)
else Code incorrect + attempts remaining
WS->>Caller: TTS "That code was incorrect. Please try again."
else Max attempts exceeded
Expand All @@ -4223,22 +4233,31 @@ sequenceDiagram
end
end
else No pending guardian challenge
WS->>Orch: startNormalCallFlow(isInbound=true)
WS->>Ctrl: startNormalCallFlow(isInbound=true)
end

Orch->>Orch: buildInboundSystemPrompt()
Note over Orch: "You are answering an incoming call<br/>on behalf of [user]. Greet warmly,<br/>find out what they need."
Orch->>LLM: initial greeting turn
LLM-->>Orch: receptionist-style greeting
Orch->>WS: sendTextToken() (TTS to caller)
Ctrl->>Bridge: startVoiceTurn([CALL_OPENING])
Bridge->>RunOrch: startRun(conversationId, [CALL_OPENING], {sourceChannel: 'voice', eventSink})
RunOrch->>Session: route to session pipeline
Note over Session: Session runtime assembly injects<br/>voice channel context + system prompt
Session->>LLM: agent loop (initial greeting turn)
LLM-->>Session: receptionist-style greeting
Session-->>Bridge: eventSink.onTextDelta()
Bridge-->>Ctrl: onTextDelta callback
Ctrl->>WS: sendTextToken() (TTS to caller)

loop Conversation turns
Caller->>WS: prompt (caller utterance)
WS->>Orch: handleCallerUtterance(transcript, speakerContext)
Orch->>LLM: messages.stream()
LLM-->>Orch: text tokens (streaming)
Orch->>WS: sendTextToken() (for TTS)
Orch->>CallStore: recordCallEvent()
WS->>Ctrl: handleCallerUtterance(transcript, speakerContext)
Ctrl->>Bridge: startVoiceTurn()
Bridge->>RunOrch: startRun(conversationId, content, {sourceChannel: 'voice', eventSink})
RunOrch->>Session: route to session pipeline
Session->>LLM: agent loop (tools, memory, skills)
LLM-->>Session: text tokens (streaming)
Session-->>Bridge: eventSink.onTextDelta()
Bridge-->>Ctrl: onTextDelta callback
Ctrl->>WS: sendTextToken() (for TTS)
Ctrl->>CallStore: recordCallEvent()
end
```

Expand Down Expand Up @@ -4266,7 +4285,8 @@ sequenceDiagram
| `assistant/src/calls/relay-server.ts` | WebSocket handler for the Twilio ConversationRelay protocol; manages RelayConnection instances per call |
| `assistant/src/calls/speaker-identification.ts` | Reusable speaker recognition primitive for voice prompts: extracts provider speaker metadata (top-level and nested fields), resolves stable per-call speaker identities, and emits speaker context for personalization |
| `assistant/src/calls/call-controller.ts` | Session-backed voice controller: routes voice turns through the daemon session pipeline via voice-session-bridge, detects ASK_GUARDIAN and END_CALL control markers |
| `assistant/src/calls/call-state.ts` | Notifier pattern (Maps with register/unregister/fire helpers) for cross-component communication: question notifiers, completion notifiers, and orchestrator registry |
| `assistant/src/calls/voice-session-bridge.ts` | Bridge between voice relay and the daemon session/run pipeline: wraps RunOrchestrator.startRun() with voice-specific defaults, translating agent-loop events into callbacks for real-time TTS streaming |
| `assistant/src/calls/call-state.ts` | Notifier pattern (Maps with register/unregister/fire helpers) for cross-component communication: question notifiers, completion notifiers, and controller registry |
| `assistant/src/calls/call-constants.ts` | Config-backed constants: max call duration, user consultation timeout, silence timeout, denied emergency numbers |
| `assistant/src/calls/voice-provider.ts` | Abstract VoiceProvider interface for provider-agnostic call initiation |
| `assistant/src/calls/voice-quality.ts` | Voice quality profile resolution: `resolveVoiceQualityProfile()` reads `calls.voice` config and returns effective TTS provider, voice spec, and fallback settings for the active mode |
Expand Down Expand Up @@ -4303,7 +4323,7 @@ The `validateTransition(current, next)` function is called by `updateCallSession

### Cross-Channel Guardian Consultation

When the LLM emits `[ASK_GUARDIAN: question]` during a voice call, the orchestrator creates a pending question and calls `dispatchGuardianQuestion()` on the guardian dispatch engine. The dispatch engine handles the full cross-channel fan-out:
When the LLM emits `[ASK_GUARDIAN: question]` during a voice call, the controller creates a pending question and calls `dispatchGuardianQuestion()` on the guardian dispatch engine. The dispatch engine handles the full cross-channel fan-out:

1. **Request creation**: A `guardian_action_request` row is created with a unique 6-character hex request code, the question text, a `pending` status, and an expiry timestamp.

Expand Down Expand Up @@ -4375,7 +4395,7 @@ This makes ingress URL updates smoother in local tunnel workflows because Twilio
| GET | `/v1/calls/:callSessionId` | Get call status, including any pending question |
| POST | `/v1/calls/:callSessionId/cancel` | Cancel an active call |
| POST | `/v1/calls/:callSessionId/answer` | Answer a pending question via HTTP (alternative to in-thread bridge) |
| POST | `/v1/calls/:callSessionId/instruction` | Relay a steering instruction to an active call's orchestrator (alternative to in-thread bridge) |
| POST | `/v1/calls/:callSessionId/instruction` | Relay a steering instruction to an active call's controller (alternative to in-thread bridge) |
| POST | `/v1/internal/twilio/status` | Internal status callback used by gateway; accepts JSON `{ params }` |
| POST | `/v1/internal/twilio/connect-action` | Internal connect action callback used by gateway; accepts JSON `{ params }` |
| WS | `/v1/calls/relay` | ConversationRelay WebSocket (bidirectional: prompt/interrupt/dtmf from Twilio, text tokens/end to Twilio) |
Expand All @@ -4394,8 +4414,8 @@ Both tools and HTTP routes delegate to the same domain functions in `call-domain

The CallController detects two special markers in the LLM's response text:

- **`[ASK_GUARDIAN: question]`** — The AI needs to consult the guardian. The orchestrator creates a pending question, notifies the session via `fireCallQuestionNotifier`, puts the caller on hold, and waits for a guardian answer (timeout configured via `calls.userConsultTimeoutSeconds`).
- **`[END_CALL]`** — The AI has determined the call's purpose is fulfilled. The orchestrator sends a goodbye, closes the ConversationRelay session, and marks the call as completed.
- **`[ASK_GUARDIAN: question]`** — The AI needs to consult the guardian. The controller creates a pending question, notifies the session via `fireCallQuestionNotifier`, puts the caller on hold, and waits for a guardian answer (timeout configured via `calls.userConsultTimeoutSeconds`).
- **`[END_CALL]`** — The AI has determined the call's purpose is fulfilled. The controller sends a goodbye, closes the ConversationRelay session, and marks the call as completed.

Both markers are stripped from the TTS output so the callee never hears the raw control text.

Expand Down Expand Up @@ -4423,7 +4443,7 @@ Call behavior is controlled via the `calls` config block in the assistant config
| `calls.disclosure.enabled` | boolean | `true` | Whether the AI should disclose it is an AI at the start of the call. |
| `calls.disclosure.text` | string | *(default disclosure prompt)* | The disclosure instruction included in the system prompt. |
| `calls.safety.denyCategories` | string[] | `[]` | Categories of calls to deny (e.g., emergency numbers are always denied regardless of this setting). |
| `calls.model` | string | *(unset — uses default model)* | Optional override for the LLM model used in call orchestration. |
| `calls.model` | string | *(unset — uses default model)* | Optional override for the LLM model used in voice call conversations. |
| `calls.voice.mode` | enum | `'twilio_standard'` | Voice quality mode. Options: `twilio_standard` (standard Twilio TTS with Google voices — fully supported), `twilio_elevenlabs_tts` (ElevenLabs voices through Twilio ConversationRelay — fully supported), `elevenlabs_agent` (full ElevenLabs conversational agent — experimental/restricted, blocked by runtime guard). |
| `calls.voice.language` | string | `'en-US'` | Language code for TTS and transcription. |
| `calls.voice.transcriptionProvider` | enum | `'Deepgram'` | Speech-to-text provider (`Deepgram` or `Google`). |
Expand Down
6 changes: 3 additions & 3 deletions assistant/src/calls/call-domain.ts
Original file line number Diff line number Diff line change
Expand Up @@ -402,7 +402,7 @@ export function getCallStatus(
}

/**
* Cancel an active call. Cleans up relay connections and orchestrators.
* Cancel an active call. Cleans up relay connections and controllers.
*/
export async function cancelCall(input: CancelCallInput): Promise<{ ok: true; session: CallSession } | CallError> {
const { callSessionId, reason } = input;
Expand Down Expand Up @@ -501,9 +501,9 @@ export async function answerCall(input: AnswerCallInput): Promise<{ ok: true; qu
}

/**
* Relay a user instruction to an active call's orchestrator.
* Relay a user instruction to an active call's controller.
* Validates that the call is active and the instruction is non-empty
* before injecting it into the orchestrator's conversation history.
* before injecting it into the controller's conversation.
*/
export async function relayInstruction(input: RelayInstructionInput): Promise<{ ok: true } | CallError> {
const { callSessionId, instructionText } = input;
Expand Down
6 changes: 3 additions & 3 deletions assistant/src/calls/twilio-routes.ts
Original file line number Diff line number Diff line change
Expand Up @@ -73,9 +73,9 @@ export function buildWelcomeGreeting(task: string | null, configuredGreeting?: s
void task;
const override = configuredGreeting?.trim();
if (override) return override;
// The contextual first opener now comes from the call orchestrator's
// initial LLM turn. Keep Twilio's relay-level greeting empty by default
// so we don't speak a deterministic static line first.
// The contextual first opener now comes from the call controller's
// initial LLM turn via the session pipeline. Keep Twilio's relay-level
// greeting empty by default so we don't speak a deterministic static line first.
return '';
}

Expand Down