Skip to content

fix(meet): pre-register bot token, capture logs + screenshots on join failure#26005

Merged
siddseethepalli merged 1 commit into
mainfrom
sidd/meet-join-debug-captures
Apr 16, 2026
Merged

fix(meet): pre-register bot token, capture logs + screenshots on join failure#26005
siddseethepalli merged 1 commit into
mainfrom
sidd/meet-join-debug-captures

Conversation

@siddseethepalli
Copy link
Copy Markdown
Contributor

@siddseethepalli siddseethepalli commented Apr 16, 2026

Summary

  • Pre-register bot API token: the meet-internal /events resolver only checked `this.sessions`, which isn't populated until after `audioIngestPromise` resolves. Bots POSTing `lifecycle:joining` in that window got 401 → terminal-error shutdown → never reached "Ask to join". Added `pendingBotTokens` populated right after `generateBotApiToken()` and cleared on every rollback path.
  • Capture bot container logs on rollback: new `DockerRunner.logs()` (with Docker's 8-byte framing demuxed) writes `/bot.log` before `runner.remove()` in the no-host-port and audio-ingest-failure rollback paths. Previously a failed join erased all bot-side evidence.
  • Screenshot + final URL on `joinMeet` timeouts: prejoin-name and in-meeting-UI selector timeouts now drop `out/{prejoin,admission}-failure.png` and append the page URL to the error, so we can see exactly what surface Google Meet rendered.

Regression tests added for the pending-token window + rollback cleanup + logs demux. All 261 meet-join tests pass; full tsc clean.

Original prompt

Live debugging session: Velissa (meet bot) failing to join Sidd's Google Meet. Manual docker run revealed the real failure — `prejoin name input did not appear within 30000ms: waiting for locator('input[aria-label="Your name"]')`. Bot never reached "Ask to join" so no admit prompt ever appeared to the host. This bundle ships the 401 race fix found earlier plus the diagnostic hooks needed to figure out why Meet isn't showing the name input on the next retry.


Open with Devin

… failure

Three related changes to make meet-join failures diagnosable:

1. Pre-register the per-meeting bot API token in `pendingBotTokens`
   before the container spawns. Previously the token only became
   resolvable after `audioIngestPromise` resolved, so the bot's earliest
   `lifecycle:joining` POST got 401, tripped its terminal-error handler,
   and shut the bot down before it could click "Ask to join" — leaving
   the user with a confusing "bot did not connect to audio.sock" timeout
   and no admit prompt.

2. Add `DockerRunner.logs()` (with multiplex demux) and capture the bot
   container's stdout/stderr to `<meetingDir>/bot.log` before every
   rollback path removes the container. Previously a failed join left
   no trace once `runner.remove()` fired — debugging required catching
   logs mid-flight.

3. On `joinMeet` selector timeouts (prejoin name input, in-meeting UI),
   screenshot the page to `<meetingDir>/out/{prejoin,admission}-failure.png`
   and include the final URL in the thrown error. Critical when Google
   Meet serves an unexpected surface (sign-in redirect, cookie banner,
   headless-detection wall, updated selectors).

Regression tests: token resolver answers during the pre-session window;
rollback clears `pendingBotTokens` on spawn failure; `demultiplexDockerLogs`
handles mixed streams, empty buffers, and truncated tails. All 261
meet-join tests pass.
@siddseethepalli siddseethepalli merged commit 7d4c4fe into main Apr 16, 2026
@siddseethepalli siddseethepalli deleted the sidd/meet-join-debug-captures branch April 16, 2026 05:06
@siddseethepalli
Copy link
Copy Markdown
Contributor Author

Focus review on:

  • Token resolver semantics (session-manager.ts:613-617, 699-707, and 3 rollback sites). The resolver now consults pendingBotTokens as a fallback — confirm the cleanup is airtight across every exit path (spawn throw, no-host-port, audio-ingest rejection, success). Regression test at session-manager.test.ts under "token resolver is populated during the container-spawn / audio-ingest window".

  • demultiplexDockerLogs in docker-runner.ts — custom parser for Docker's multiplexed logs stream (8-byte frame header). Small but fiddly; tests cover happy path, empty, and truncated-trailing-frame.

  • /out as a writable diagnostics surface in the bot. Screenshots written there land on the host via the existing workspace bind. Consider whether prejoin-failure.png/admission-failure.png should be namespaced (per-attempt timestamp?) if we start keeping a history.

Risk: medium. The token-resolver change is behavior-altering on a path that previously 401'd; if my reading of the join-ordering is wrong, the worst case is a stale token sticking around one attempt too long (cleaned up on next rollback). The log/screenshot capture is best-effort and swallows its own errors.

Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 2 potential issues.

View 5 additional findings in Devin Review.

Open in Devin Review

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 _resetForTests does not clear pendingBotTokens, leaking pre-registered tokens across resets

The new pendingBotTokens map (session-manager.ts:551) is populated during join() at line 706 and cleared on each rollback path and on successful session creation. However, _resetForTests() at line 662 only calls this.sessions.clear() — it does not call this.pendingBotTokens.clear(). If a join() is interrupted between pendingBotTokens.set and pendingBotTokens.delete (e.g. during a mid-flight test teardown), the stale token remains in the map. The bot-API-token resolver at session-manager.ts:616 would then return a stale token for that meeting ID on subsequent lookups, which could cause a future join for the same meeting ID to incorrectly authenticate against the old token.

(Refers to line 662)

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

@@ -1782,6 +1814,35 @@ export function substituteAssistantName(
}

/** Strip internal fields (`timeoutHandle`) from a session before exposing it. */
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Orphaned JSDoc comment displaces sessionView documentation

The insertion of captureBotLogs between the pre-existing /** Strip internal fields... */ JSDoc (line 1816) and the sessionView function (line 1846) leaves the old JSDoc orphaned — TypeScript attaches it to nothing since the immediately following /** Best-effort: ... */ block wins for captureBotLogs. Meanwhile sessionView now has no JSDoc at all. The stale comment is misleading: a reader seeing Strip internal fields (timeoutHandle) above captureBotLogs may think it describes that function.

Suggested change
/** Strip internal fields (`timeoutHandle`) from a session before exposing it. */
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 960f39753a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines 699 to +706
const botApiToken = generateBotApiToken();
// Pre-register the token so `/v1/internal/meet/:id/events` can
// authenticate the bot's earliest `lifecycle:joining` POST — which
// fires before the `ActiveSession` record lands in `this.sessions`
// (that happens only after the audio-ingest handshake completes).
// Cleared on every join-rollback path below and replaced by the
// authoritative `this.sessions` lookup once the session is in the map.
this.pendingBotTokens.set(meetingId, botApiToken);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Clear pending bot token when pre-run setup fails

pendingBotTokens is populated before getProviderKey/other pre-container setup, but those calls are outside the rollback try/catch blocks. If any of them throws, join() exits with no active session and no dispatcher, yet resolveBotApiToken(meetingId) still returns a valid token for that meeting ID. That leaves stale auth state in memory and allows /v1/internal/meet/:meetingId/events to authenticate against a non-existent session until process restart or a later overwrite.

Useful? React with 👍 / 👎.

@siddseethepalli
Copy link
Copy Markdown
Contributor Author

Addressed in #26266_resetForTests now clears pendingBotTokens; pre-container setup wrapped in try/catch to clean up pending tokens on failure; orphaned JSDoc removed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant