fix: prevent stale presence showing users as online after server restart#21
fix: prevent stale presence showing users as online after server restart#21BuckyMcYolo merged 1 commit intomainfrom
Conversation
Adds heartbeat TTL to per-user socket sets and periodic reconciliation to clean up stale presence entries when servers crash without running disconnect handlers. Also wires up onboarding invite code/link joining and updates roadmap.
|
Caution Review failedPull request was closed or merged during review 📝 WalkthroughWalkthroughPresence heartbeat management system introduced with periodic TTL refresh and reconciliation logic. Invite code parsing improved to accept both URLs and raw codes. Onboarding dialog gated to exclude invite routes. ROADMAP tasks marked complete. Changes
Sequence DiagramsequenceDiagram
participant Client as Client Socket
participant Server as Realtime Server
participant Redis as Redis
Client->>Server: Socket Connection
Server->>Redis: markUserConnected(userId, socketId, TTL)
Redis->>Redis: ADD socket to user set<br/>EXPIRE user set (60s)
Server->>Server: Start 30s heartbeat interval
loop Every 30s (while socket connected)
Server->>Redis: refreshPresenceHeartbeat(userId)
Redis->>Redis: EXPIRE user socket set (60s)
end
Client->>Server: Disconnect Event
Server->>Server: Clear heartbeat interval
Note over Server,Redis: Separate reconciliation loop
Server->>Server: Every 30s: reconcilePresence()
Server->>Redis: Get all online user IDs
Redis-->>Server: User ID list
loop For each user ID
Server->>Redis: Check if socket set exists
Redis-->>Server: Key exists? true/false
alt Key missing (stale)
Server->>Redis: Remove user from online set
end
end
Server->>Server: Log reconciled count
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment Tip Migrating from UI to YAML configuration.Use the |
Adds heartbeat TTL to per-user socket sets and periodic reconciliation to clean up stale presence entries when servers crash without running disconnect handlers. Also wires up onboarding invite code/link joining and updates roadmap.
Pull Request Summary
This PR addresses stale presence entries that remain when realtime servers crash without running disconnect handlers. The solution implements a heartbeat TTL mechanism combined with periodic reconciliation.
Key Changes
Presence Service (
apps/realtime/src/services/presence.ts)PRESENCE_HEARTBEAT_TTLconstant (60 seconds)markUserConnectedto set TTL on per-user socket setsrefreshPresenceHeartbeat()to refresh TTL via RedisEXPIREcommandreconcilePresence()to periodically detect and remove stale presence entries by checking if socket-set keys existRealtime Server (
apps/realtime/src/index.ts)refreshPresenceHeartbeat()socket.once("disconnect")reconcilePresence()and log resultsOnboarding Integration (
apps/web/src/components/onboarding/onboarding-dialog.tsx&apps/web/src/routes/_authenticated.tsx)parseInviteCode()function to extract codes from both raw codes (alphanumeric) and full invite URLs with optional path/query/fragmenthandleJointo parse input and navigate to/invite/$codewith the extracted code/invite/*routes even when onboarding is incompleteProject Roadmap
Implementation Quality
Strengths:
isCurrentSocketAlivecheck during initializationConcerns:
refreshPresenceHeartbeat,reconcilePresence) lack any unit or integration testsexists()per online user) on every reconciliation cycle could be problematic with thousands of concurrent users.catch(() => {}), potentially masking Redis connectivity issuesRisk Assessment
The implementation is functionally correct and won't break existing code. The approach is sound, though it trades some operational overhead (additional Redis calls every 30s) for operational safety (preventing stale presence). At scale (thousands of concurrent users), the reconciliation loop could become a bottleneck. The lack of tests is a notable gap but doesn't block basic functionality.
Confidence Score: 3/5
Mostly solid with some minor issues. The core logic is sound and solves the intended problem. However, the absence of tests, potential scalability concerns with the reconciliation loop, and silent error handling in the heartbeat mechanism prevent a higher score. The implementation follows existing patterns and includes appropriate error logging, but should have unit/integration tests and monitoring instrumentation before production deployment at scale.