fix(meet): pre-register bot token, capture logs + screenshots on join failure#26005
Conversation
… failure
Three related changes to make meet-join failures diagnosable:
1. Pre-register the per-meeting bot API token in `pendingBotTokens`
before the container spawns. Previously the token only became
resolvable after `audioIngestPromise` resolved, so the bot's earliest
`lifecycle:joining` POST got 401, tripped its terminal-error handler,
and shut the bot down before it could click "Ask to join" — leaving
the user with a confusing "bot did not connect to audio.sock" timeout
and no admit prompt.
2. Add `DockerRunner.logs()` (with multiplex demux) and capture the bot
container's stdout/stderr to `<meetingDir>/bot.log` before every
rollback path removes the container. Previously a failed join left
no trace once `runner.remove()` fired — debugging required catching
logs mid-flight.
3. On `joinMeet` selector timeouts (prejoin name input, in-meeting UI),
screenshot the page to `<meetingDir>/out/{prejoin,admission}-failure.png`
and include the final URL in the thrown error. Critical when Google
Meet serves an unexpected surface (sign-in redirect, cookie banner,
headless-detection wall, updated selectors).
Regression tests: token resolver answers during the pre-session window;
rollback clears `pendingBotTokens` on spawn failure; `demultiplexDockerLogs`
handles mixed streams, empty buffers, and truncated tails. All 261
meet-join tests pass.
|
Focus review on:
Risk: medium. The token-resolver change is behavior-altering on a path that previously 401'd; if my reading of the join-ordering is wrong, the worst case is a stale token sticking around one attempt too long (cleaned up on next rollback). The log/screenshot capture is best-effort and swallows its own errors. |
There was a problem hiding this comment.
🟡 _resetForTests does not clear pendingBotTokens, leaking pre-registered tokens across resets
The new pendingBotTokens map (session-manager.ts:551) is populated during join() at line 706 and cleared on each rollback path and on successful session creation. However, _resetForTests() at line 662 only calls this.sessions.clear() — it does not call this.pendingBotTokens.clear(). If a join() is interrupted between pendingBotTokens.set and pendingBotTokens.delete (e.g. during a mid-flight test teardown), the stale token remains in the map. The bot-API-token resolver at session-manager.ts:616 would then return a stale token for that meeting ID on subsequent lookups, which could cause a future join for the same meeting ID to incorrectly authenticate against the old token.
(Refers to line 662)
Was this helpful? React with 👍 or 👎 to provide feedback.
| @@ -1782,6 +1814,35 @@ export function substituteAssistantName( | |||
| } | |||
|
|
|||
| /** Strip internal fields (`timeoutHandle`) from a session before exposing it. */ | |||
There was a problem hiding this comment.
🟡 Orphaned JSDoc comment displaces sessionView documentation
The insertion of captureBotLogs between the pre-existing /** Strip internal fields... */ JSDoc (line 1816) and the sessionView function (line 1846) leaves the old JSDoc orphaned — TypeScript attaches it to nothing since the immediately following /** Best-effort: ... */ block wins for captureBotLogs. Meanwhile sessionView now has no JSDoc at all. The stale comment is misleading: a reader seeing Strip internal fields (timeoutHandle) above captureBotLogs may think it describes that function.
| /** Strip internal fields (`timeoutHandle`) from a session before exposing it. */ |
Was this helpful? React with 👍 or 👎 to provide feedback.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 960f39753a
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| const botApiToken = generateBotApiToken(); | ||
| // Pre-register the token so `/v1/internal/meet/:id/events` can | ||
| // authenticate the bot's earliest `lifecycle:joining` POST — which | ||
| // fires before the `ActiveSession` record lands in `this.sessions` | ||
| // (that happens only after the audio-ingest handshake completes). | ||
| // Cleared on every join-rollback path below and replaced by the | ||
| // authoritative `this.sessions` lookup once the session is in the map. | ||
| this.pendingBotTokens.set(meetingId, botApiToken); |
There was a problem hiding this comment.
Clear pending bot token when pre-run setup fails
pendingBotTokens is populated before getProviderKey/other pre-container setup, but those calls are outside the rollback try/catch blocks. If any of them throws, join() exits with no active session and no dispatcher, yet resolveBotApiToken(meetingId) still returns a valid token for that meeting ID. That leaves stale auth state in memory and allows /v1/internal/meet/:meetingId/events to authenticate against a non-existent session until process restart or a later overwrite.
Useful? React with 👍 / 👎.
|
Addressed in #26266 — |
Summary
Regression tests added for the pending-token window + rollback cleanup + logs demux. All 261 meet-join tests pass; full tsc clean.
Original prompt
Live debugging session: Velissa (meet bot) failing to join Sidd's Google Meet. Manual docker run revealed the real failure — `prejoin name input did not appear within 30000ms: waiting for locator('input[aria-label="Your name"]')`. Bot never reached "Ask to join" so no admit prompt ever appeared to the host. This bundle ships the 401 race fix found earlier plus the diagnostic hooks needed to figure out why Meet isn't showing the name input on the next retry.