Skip to content

fix(mcp): restore lightweight device presence heartbeat#3171

Open
rogalio wants to merge 2 commits into
superset-sh:mainfrom
rogalio:fix/mcp-device-presence-heartbeat
Open

fix(mcp): restore lightweight device presence heartbeat#3171
rogalio wants to merge 2 commits into
superset-sh:mainfrom
rogalio:fix/mcp-device-presence-heartbeat

Conversation

@rogalio
Copy link
Copy Markdown
Contributor

@rogalio rogalio commented Apr 4, 2026

Problem

Since #2904 removed the 30-second heartbeat to reduce Vercel API costs, devices register only once at startup. The MCP list_devices tool still checks a 60-second online window, so every device appears offline after one minute and MCP commands can no longer reach it. Multiple users reporting this (#3091).

What this PR does

Restores a lightweight 5-minute periodic re-registration in useDevicePresence, and bumps the online window from 60s to 10 minutes.

Before #2904 After #2904 (broken) This PR
Heartbeat interval 30 seconds None (one-time) 5 minutes
Online window 60 seconds 60 seconds 10 minutes
API calls/device/day 2,880 1 288
Device stays online? Yes No (offline after 60s) Yes

Alternative: increase the window to 24h instead

If the cost reduction from #2904 is more important than real-time accuracy, a simpler fix would be to just bump the online window to 24 hours with zero additional API calls. Trade-off: a device closed 23h ago would still appear online, but executeOnDevice already handles unreachable devices via its timeout mechanism.

@Kitenite — would you prefer the 24h window approach (zero cost, less accurate) or the 5-minute heartbeat (288 calls/day, accurate)? Happy to adjust either way.

Changes

  • useDevicePresence: Add 5-minute setInterval re-registration
  • list-devices.ts: Bump online window from 60s to 10 minutes
  • Tests: Update offline device fixture to match new window
  • Docs: Update MCP_TOOLS.md

Test plan

  • bun test in packages/mcp/ — list-devices tests pass
  • Launch desktop app, wait 10+ minutes, verify list_devices still shows device as online
  • MCP command targeting the device after 5+ minutes should succeed

Closes #3091

Summary by CodeRabbit

  • Changes
    • Device eligibility window for receiving commands has been extended from 60 seconds to 10 minutes.
    • Periodic device presence updates now occur every 5 minutes to improve connectivity and reliability.

Since PR superset-sh#2904 removed the 30-second heartbeat to cut Vercel costs,
devices register only once at startup. The MCP `list_devices` tool
still checks a 60-second online window, so every device appears
offline after one minute and MCP commands can no longer reach it.

- Add a 5-minute periodic re-registration in useDevicePresence
  (288 calls/device/day vs the old 2,880 — 90% cost reduction kept)
- Bump the online window from 60s to 10 minutes (2x the heartbeat
  interval to tolerate network hiccups)
- Update tests and documentation to match

Closes superset-sh#3091
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 4, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 6ec46f4b-7c0f-4f5d-b5ea-6671ae762978

📥 Commits

Reviewing files that changed from the base of the PR and between e150c9b and 76462bf.

📒 Files selected for processing (1)
  • apps/desktop/src/renderer/routes/_authenticated/components/AgentHooks/hooks/useDevicePresence/useDevicePresence.ts
🚧 Files skipped from review as they are similar to previous changes (1)
  • apps/desktop/src/renderer/routes/_authenticated/components/AgentHooks/hooks/useDevicePresence/useDevicePresence.ts

📝 Walkthrough

Walkthrough

Extended device “recently seen” window from 60 seconds to 10 minutes; added periodic device registration every 5 minutes in the desktop client; updated docs and tests to reflect the new online/offline threshold.

Changes

Cohort / File(s) Summary
Documentation
apps/api/MCP_TOOLS.md
Updated device command eligibility text to require the target device be “seen within last 10 minutes” (replacing previous 60s wording).
Desktop device presence
apps/desktop/src/renderer/routes/_authenticated/components/AgentHooks/hooks/useDevicePresence/useDevicePresence.ts
Removed per-org one-time register guard; added PRESENCE_INTERVAL_MS (5 minutes), immediate register() on effect start, setInterval for recurring registration, error logging via console.debug, and interval cleanup on teardown.
MCP tool & tests
packages/mcp/src/tools/devices/list-devices/list-devices.ts, packages/mcp/src/tools/devices/list-devices/list-devices.test.ts
Changed DEVICE_ONLINE_WINDOW_MS from 60_000 to 10 * 60 * 1000 (10 minutes); updated tool description and adjusted test fixture lastSeenAt values to match the new 10-minute window.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 I hopped the clock from one to ten,
A heartbeat kinder than before—my friend.
I nudge the server every five,
So devices hum and stay alive. 🥕✨

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title 'fix(mcp): restore lightweight device presence heartbeat' clearly describes the main change—restoring periodic device presence registration.
Description check ✅ Passed The PR description comprehensively covers problem, solution, alternatives, changes, and test plan; all required template sections are addressed with substantive content.
Linked Issues check ✅ Passed The PR directly addresses issue #3091 by restoring periodic device presence re-registration and extending the online detection window, fulfilling the core requirement that devices remain visible after the heartbeat period.
Out of Scope Changes check ✅ Passed All changes (device presence interval, online window extension, test fixture updates, and documentation) are directly aligned with the objective of fixing the device offline issue described in #3091.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 4, 2026

Greptile Summary

This PR restores a lightweight device presence heartbeat (every 5 minutes) to fix a regression introduced in #2904, where removing the 30-second heartbeat caused all devices to appear offline after 60 seconds, breaking MCP command routing entirely. The fix is minimal and well-targeted: a setInterval in useDevicePresence re-calls the existing registerDevice upsert on a 5-minute cadence, and DEVICE_ONLINE_WINDOW_MS is widened to 10 minutes (2× the interval) to provide a safe buffer.

  • useDevicePresence.ts: Adds a 5-minute setInterval with correct clearInterval cleanup on effect teardown; removes the registeredScopeRef org-guard (no longer needed since the interval handles retries naturally); org-switch and strict-mode double-mount are handled correctly via the dependency array and cleanup.
  • list-devices.ts: DEVICE_ONLINE_WINDOW_MS bumped from 60 s → 10 minutes; tool description updated to match.
  • list-devices.test.ts: Offline fixture updated from 120 s → 15 minutes so the device correctly falls outside the new 10-minute window; both fixture sites updated consistently.
  • MCP_TOOLS.md: Documentation reflects the new "seen within last 10 minutes" language.
  • The cost reduction arithmetic (2,880 → 288 API calls/device/day) is accurate and the 2× window-to-interval ratio is a solid design choice.
  • One minor concern: registration failures are now completely silenced (catch(() => {})), which trades the previous ref-reset retry strategy for interval-based retries — reasonable, but worth adding a dev-mode console warning for observability.

Confidence Score: 5/5

  • Safe to merge — the fix is minimal, directly addresses the regression, and all test cases remain valid with the updated fixture values.
  • All four changed files are consistent with each other: the heartbeat interval (5 min), the online window (10 min = 2× interval), the offline test fixture (15 min > 10 min window), and the docs all agree. The useEffect cleanup correctly clears the interval, preventing leaks on unmount or dependency change. The only open item is a P2 style suggestion to surface errors in dev mode rather than silencing them entirely — not a blocker.
  • No files require special attention; the single P2 comment on useDevicePresence.ts is a non-blocking observability improvement.

Important Files Changed

Filename Overview
apps/desktop/src/renderer/routes/_authenticated/components/AgentHooks/hooks/useDevicePresence/useDevicePresence.ts Adds a 5-minute setInterval to periodically re-call registerDevice; removes the orgId guard ref and cleans up the interval on effect teardown. Logic and cleanup are correct; error handling is entirely silent.
packages/mcp/src/tools/devices/list-devices/list-devices.ts Bumps DEVICE_ONLINE_WINDOW_MS from 60 s to 10 minutes and updates the tool description accordingly — straightforward and correct.
packages/mcp/src/tools/devices/list-devices/list-devices.test.ts Updates the offline device fixture from 120 s to 15 minutes (beyond the new 10-minute window); both fixture sites updated consistently and test assertions remain valid.
apps/api/MCP_TOOLS.md Documentation updated to reflect the new 10-minute online window — matches the code change exactly.

Sequence Diagram

sequenceDiagram
    participant Desktop as Desktop App (useDevicePresence)
    participant API as Superset API (registerDevice)
    participant DB as DB (device_presence)
    participant MCP as MCP Client (list_devices)

    Desktop->>API: registerDevice (on mount)
    API->>DB: upsert lastSeenAt
    
    loop Every 5 minutes
        Desktop->>API: registerDevice (heartbeat)
        API->>DB: upsert lastSeenAt
    end

    MCP->>DB: query devices WHERE lastSeenAt >= now - 10min
    DB-->>MCP: device list with isOnline flags
Loading

Reviews (1): Last reviewed commit: "fix(mcp): restore lightweight device pre..." | Re-trigger Greptile

deviceName: deviceInfo.deviceName,
deviceType: "desktop",
})
.catch(() => {});
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Silent error swallowing

The previous implementation reset registeredScopeRef.current = null on failure so the next render/launch would retry. The new code replaces that with a 5-minute interval retry, which is fine — but swallowing the error entirely with an empty catch makes it impossible to diagnose persistent failures (e.g., auth errors, network outages, broken API endpoint). Consider at least logging to the console in dev:

Suggested change
.catch(() => {});
.catch((err) => {
if (process.env.NODE_ENV === "development") {
console.warn("[useDevicePresence] registration failed:", err);
}
});

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
apps/desktop/src/renderer/routes/_authenticated/components/AgentHooks/hooks/useDevicePresence/useDevicePresence.ts (1)

29-29: Silent error handling may obscure persistent failures.

The .catch(() => {}) swallows all errors, including network failures, auth issues, and server errors. While this is acceptable for a fire-and-forget heartbeat, consider logging at debug level or tracking consecutive failures to surface persistent registration problems during development.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@apps/desktop/src/renderer/routes/_authenticated/components/AgentHooks/hooks/useDevicePresence/useDevicePresence.ts`
at line 29, The silent .catch(() => {}) in useDevicePresence should not swallow
all errors; update the promise rejection handler in the function (the chain
ending with .catch(() => {})) to log the error at debug level (e.g.,
console.debug or the existing logger) and increment/reset a consecutive-failure
counter stored in the useDevicePresence hook state so persistent failures can be
detected and surfaced (e.g., emit a debug-level warning or set an
isRegistrationHealthy flag after N failures). Ensure the handler references the
caught error (err) and ties into existing hook state or telemetry so transient
vs persistent failures are distinguishable.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In
`@apps/desktop/src/renderer/routes/_authenticated/components/AgentHooks/hooks/useDevicePresence/useDevicePresence.ts`:
- Line 29: The silent .catch(() => {}) in useDevicePresence should not swallow
all errors; update the promise rejection handler in the function (the chain
ending with .catch(() => {})) to log the error at debug level (e.g.,
console.debug or the existing logger) and increment/reset a consecutive-failure
counter stored in the useDevicePresence hook state so persistent failures can be
detected and surfaced (e.g., emit a debug-level warning or set an
isRegistrationHealthy flag after N failures). Ensure the handler references the
caught error (err) and ties into existing hook state or telemetry so transient
vs persistent failures are distinguishable.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 6e1e52b4-135c-4dd2-a3cc-bb787ea92f76

📥 Commits

Reviewing files that changed from the base of the PR and between 7599ace and e150c9b.

📒 Files selected for processing (4)
  • apps/api/MCP_TOOLS.md
  • apps/desktop/src/renderer/routes/_authenticated/components/AgentHooks/hooks/useDevicePresence/useDevicePresence.ts
  • packages/mcp/src/tools/devices/list-devices/list-devices.test.ts
  • packages/mcp/src/tools/devices/list-devices/list-devices.ts

Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 4 files

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="apps/desktop/src/renderer/routes/_authenticated/components/AgentHooks/hooks/useDevicePresence/useDevicePresence.ts">

<violation number="1" location="apps/desktop/src/renderer/routes/_authenticated/components/AgentHooks/hooks/useDevicePresence/useDevicePresence.ts:29">
P2: `registerDevice` failures are silently swallowed by an empty catch block, removing actionable diagnostics in the periodic heartbeat path.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[bug] MCP device heartbeat too aggressive — device shows offline after 60 seconds

1 participant