-
Notifications
You must be signed in to change notification settings - Fork 13k
fix(presence): batch presence updates to prevent broadcast storm #21182 #38238
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
fix(presence): batch presence updates to prevent broadcast storm #21182 #38238
Conversation
…etChat#21182 - Implemented buffering in Presence service to batch updates every 500ms - Added 'presence.status.batch' event to Events definition - Updated ListenersModule to handle batched updates - Updated AppsEngineService to handle batched updates - Updated OmnichannelService to handle batched updates - Added unit tests for batching logic
|
Looks like this PR is not ready to merge, because of the following issues:
Please fix the issues and try again If you have any trouble, please check the PR guidelines |
🦋 Changeset detectedLatest commit: 71a77ff The changes in this PR will be included in the next version bump. This PR includes changesets to release 40 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
|
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. WalkthroughAdds debounced presence batching: presence updates are accumulated for 500ms and emitted as a single Changes
Sequence Diagram(s)sequenceDiagram
participant Presence as Presence (collector)
participant CoreAPI as Core API (api.broadcast)
participant Listeners as Listeners Module
participant AppsEngine as Apps Engine
participant Omnichannel as Omnichannel Service
Note over Presence: collect status updates\n(per-user) into presenceBatch
Presence->>Presence: schedule 500ms debounce
alt debounce timeout fires
Presence->>CoreAPI: broadcast 'presence.status.batch' [users[]]
CoreAPI->>Listeners: deliver presence.status.batch
CoreAPI->>AppsEngine: deliver presence.status.batch
CoreAPI->>Omnichannel: deliver presence.status.batch
Listeners->>Listeners: iterate users[]\npush diffs, notify logged instance, send presence update
AppsEngine->>AppsEngine: iterate users[]\ntrigger IPostUserStatusChanged per user
Omnichannel->>Omnichannel: iterate users[]\nif role in [livechat-agent, livechat-manager, livechat-monitor]\nnotifyAgentStatusChanged
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~30 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@ee/packages/presence/src/Presence.ts`:
- Around line 311-328: In stopped(), ensure any pending batch timeout is
cancelled to avoid firing after teardown: if this.batchTimeout exists call
clearTimeout(this.batchTimeout) and set this.batchTimeout = undefined; also
consider clearing this.presenceBatch (this.presenceBatch.clear()) or flushing it
safely before returning so no pending broadcast via
this.api?.broadcast('presence.status.batch', ...) runs after stop; update the
stopped() method to perform these actions.
- Around line 25-31: The presenceBatch map's user projection is missing the
`name` field which causes batched payloads to lack `name`; update the type for
presenceBatch to include `name` in the Pick (e.g., Pick<IUser, '_id' |
'username' | 'status' | 'statusText' | 'name' | 'roles'>), then update any DB
query projections used by setStatus, updateUserPresence and wherever users are
loaded so they include `name`, and change the broadcast method signature to
accept and forward `name` to match Events.ts presence.status.batch and the
listeners.module.ts handler that destructures `name`. Ensure all usages and
types referencing presenceBatch, setStatus, updateUserPresence, and broadcast
are updated accordingly.
🧹 Nitpick comments (3)
apps/meteor/server/services/apps-engine/service.ts (1)
33-41: Consider adding error handling for individual batch items.If
Apps.self?.triggerEventthrows for one user in the batch, subsequent users won't be processed. Consider wrapping each iteration in a try-catch or usingPromise.allSettledif order independence is acceptable.Proposed fix with error resilience
this.onEvent('presence.status.batch', async (batch): Promise<void> => { for (const { user, previousStatus } of batch) { + try { await Apps.self?.triggerEvent(AppEvents.IPostUserStatusChanged, { user, currentStatus: user.status, previousStatus, }); + } catch (error) { + // Log error but continue processing remaining batch items + SystemLogger.error({ msg: 'Failed to trigger IPostUserStatusChanged for user', userId: user._id, error }); + } } });apps/meteor/server/services/omnichannel/service.ts (1)
34-45: LGTM with a suggestion to reduce duplication.The batch handler correctly mirrors the single-event logic. Consider extracting the common validation and notification logic into a helper to avoid duplication:
Optional: Extract helper to reduce duplication
+ private async handlePresenceUser(user: { _id?: string; status?: UserStatus; roles?: string[] }): Promise<void> { + if (!user?._id) { + return; + } + const hasRole = user.roles?.some((role) => ['livechat-manager', 'livechat-monitor', 'livechat-agent'].includes(role)); + if (hasRole) { + await notifyAgentStatusChanged(user._id, user.status); + } + } + override async created() { this.onEvent('presence.status', async ({ user }): Promise<void> => { - if (!user?._id) { - return; - } - const hasRole = user.roles.some((role) => ['livechat-manager', 'livechat-monitor', 'livechat-agent'].includes(role)); - if (hasRole) { - // TODO change `Livechat.notifyAgentStatusChanged` to a service call - await notifyAgentStatusChanged(user._id, user.status); - } + await this.handlePresenceUser(user); }); this.onEvent('presence.status.batch', async (batch): Promise<void> => { for (const { user } of batch) { - if (!user?._id) { - continue; - } - const hasRole = user.roles.some((role) => ['livechat-manager', 'livechat-monitor', 'livechat-agent'].includes(role)); - if (hasRole) { - // TODO change `Livechat.notifyAgentStatusChanged` to a service call - await notifyAgentStatusChanged(user._id, user.status); - } + await this.handlePresenceUser(user); } }); }ee/packages/presence/src/Presence.spec.ts (1)
29-101: Good test coverage for core batching scenarios.The tests effectively validate buffering, batching multiple users, and debouncing. Consider adding a test to verify that no broadcast occurs when
broadcastEnabledisfalse:Optional: Additional test for disabled broadcast
it('should not broadcast when broadcastEnabled is false', () => { (presence as any).broadcastEnabled = false; const user = { _id: 'u1', username: 'user1', status: 'online' } as any; (presence as any).broadcast(user, 'offline'); expect((presence as any).presenceBatch.size).toBe(0); jest.advanceTimersByTime(500); expect((presence as any).api.broadcast).not.toHaveBeenCalled(); });
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2 issues found across 6 files
Prompt for AI agents (all issues)
Check if these issues are valid — if so, understand the root cause of each and fix them.
<file name="ee/packages/presence/src/Presence.ts">
<violation number="1" location="ee/packages/presence/src/Presence.ts:318">
P2: New batch timeout isn’t cleared on service stop, allowing pending broadcasts after shutdown</violation>
<violation number="2" location="ee/packages/presence/src/Presence.ts:327">
P2: Pending presence batch is broadcast even if broadcasting is disabled before flush</violation>
</file>
Since this is your first cubic review, here's how it works:
- cubic automatically reviews your code and comments on bugs and improvements
- Teach cubic by replying to its comments. cubic learns from your replies and gets better over time
- Ask questions if you need clarification on any suggestion
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com>
|
I have addressed the feedback and the PR is ready for review. Could a maintainer please add the stat: QA assured label and set the milestone to 8.2.0? |
Description
This PR addresses issue #21182 by implementing a batching strategy for presence status updates.
Problem
Mass user reconnections caused a storm of presence broadcasts, overwhelming the internal event bus and causing O(N^2) work, leading to CPU spikes and instability.
Solution
Presence.ts. Updates are now collected in aMapand flushed every 500ms.presence.status.batchinEvents.ts.ListenersModule,AppsEngineService,OmnichannelService) to handle the new batched event format.Verification
ee/packages/presence/src/Presence.spec.tscovering buffering, batching, and debouncing logic.should buffer broadcast eventsshould batch multiple updatesshould debounce updates for same userCloses #21182
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.