Skip to content

Conversation

@shash-hq
Copy link

@shash-hq shash-hq commented Jan 17, 2026

Description

This PR addresses issue #21182 by implementing a batching strategy for presence status updates.

Problem

Mass user reconnections caused a storm of presence broadcasts, overwhelming the internal event bus and causing O(N^2) work, leading to CPU spikes and instability.

Solution

  • Implemented a buffering mechanism in Presence.ts. Updates are now collected in a Map and flushed every 500ms.
  • Introduced a new event signature presence.status.batch in Events.ts.
  • Updated consumers (ListenersModule, AppsEngineService, OmnichannelService) to handle the new batched event format.

Verification

  • Added new unit tests in ee/packages/presence/src/Presence.spec.ts covering buffering, batching, and debouncing logic.
  • Tests passed:
    • should buffer broadcast events
    • should batch multiple updates
    • should debounce updates for same user

Closes #21182

Summary by CodeRabbit

  • New Features
    • Presence updates are now batched and broadcast together (debounced) to reduce load during mass status changes; services now accept batched presence events.
  • Refactor
    • Event payload shapes for presence and related watch events were adjusted to support batched updates.
  • Tests
    • Added tests covering presence batching, debouncing, and payload emission.
  • Chores
    • Changeset and package version bumps included.

✏️ Tip: You can customize this high-level summary in your review settings.

…etChat#21182

- Implemented buffering in Presence service to batch updates every 500ms
- Added 'presence.status.batch' event to Events definition
- Updated ListenersModule to handle batched updates
- Updated AppsEngineService to handle batched updates
- Updated OmnichannelService to handle batched updates
- Added unit tests for batching logic
@shash-hq shash-hq requested a review from a team as a code owner January 17, 2026 22:03
@dionisio-bot
Copy link
Contributor

dionisio-bot bot commented Jan 17, 2026

Looks like this PR is not ready to merge, because of the following issues:

  • This PR is missing the 'stat: QA assured' label
  • This PR is missing the required milestone or project

Please fix the issues and try again

If you have any trouble, please check the PR guidelines

@changeset-bot
Copy link

changeset-bot bot commented Jan 17, 2026

🦋 Changeset detected

Latest commit: 71a77ff

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 40 packages
Name Type
@rocket.chat/core-services Patch
@rocket.chat/meteor Patch
@rocket.chat/account-service Patch
@rocket.chat/authorization-service Patch
@rocket.chat/ddp-streamer Patch
@rocket.chat/omnichannel-transcript Patch
@rocket.chat/presence-service Patch
@rocket.chat/queue-worker Patch
@rocket.chat/abac Patch
@rocket.chat/federation-matrix Patch
@rocket.chat/network-broker Patch
@rocket.chat/omni-core-ee Patch
@rocket.chat/omnichannel-services Patch
@rocket.chat/presence Patch
rocketchat-services Patch
@rocket.chat/core-typings Patch
@rocket.chat/rest-typings Patch
@rocket.chat/uikit-playground Patch
@rocket.chat/api-client Patch
@rocket.chat/apps Patch
@rocket.chat/cron Patch
@rocket.chat/ddp-client Patch
@rocket.chat/fuselage-ui-kit Patch
@rocket.chat/gazzodown Patch
@rocket.chat/http-router Patch
@rocket.chat/livechat Patch
@rocket.chat/model-typings Patch
@rocket.chat/ui-avatar Patch
@rocket.chat/ui-client Patch
@rocket.chat/ui-contexts Patch
@rocket.chat/ui-voip Patch
@rocket.chat/web-ui-registration Patch
@rocket.chat/license Patch
@rocket.chat/media-calls Patch
@rocket.chat/pdf-worker Patch
@rocket.chat/models Patch
@rocket.chat/mock-providers Patch
@rocket.chat/ui-video-conf Patch
@rocket.chat/instance-status Patch
@rocket.chat/omni-core Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 17, 2026

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Walkthrough

Adds debounced presence batching: presence updates are accumulated for 500ms and emitted as a single presence.status.batch event. Existing listeners and services now consume the batch and iterate per-user to apply existing presence, notification, and agent-status logic.

Changes

Cohort / File(s) Summary
Presence batching implementation
ee/packages/presence/src/Presence.ts
Adds presenceBatch map and batchTimeout; accumulates per-user updates and emits a single presence.status.batch after 500ms instead of immediate single-user broadcasts.
Event signature changes
packages/core-services/src/events/Events.ts
Adds 'presence.status.batch' event signature (array of user-status entries) and refactors several union payload shapes (watch.subscriptions, watch.users, LoginServiceConfigurationEvent) to separate inserted/updated/removed branches.
Listener & service handlers
apps/meteor/server/modules/listeners/listeners.module.ts, apps/meteor/server/services/apps-engine/service.ts, apps/meteor/server/services/omnichannel/service.ts
Registers presence.status.batch handlers that iterate batch items and apply the same logic as single presence.status (push user diffs, notify logged-in instance, send presence updates, trigger agent status changes for relevant roles).
Tests for batching
ee/packages/presence/src/Presence.spec.ts
Adds tests verifying single-batch emission, debounced aggregation of multiple users, and deduping of repeated updates for the same user using fake timers and mocks.
Changeset
.changeset/breezy-dolphins-sing.md
Adds changeset documenting presence batching and package patch bumps.
Minor formatting
apps/meteor/server/modules/listeners/listeners.module.ts
Small whitespace/formatting tweaks in custom domain parsing chain (no functional change).

Sequence Diagram(s)

sequenceDiagram
  participant Presence as Presence (collector)
  participant CoreAPI as Core API (api.broadcast)
  participant Listeners as Listeners Module
  participant AppsEngine as Apps Engine
  participant Omnichannel as Omnichannel Service

  Note over Presence: collect status updates\n(per-user) into presenceBatch
  Presence->>Presence: schedule 500ms debounce
  alt debounce timeout fires
    Presence->>CoreAPI: broadcast 'presence.status.batch' [users[]]
    CoreAPI->>Listeners: deliver presence.status.batch
    CoreAPI->>AppsEngine: deliver presence.status.batch
    CoreAPI->>Omnichannel: deliver presence.status.batch

    Listeners->>Listeners: iterate users[]\npush diffs, notify logged instance, send presence update
    AppsEngine->>AppsEngine: iterate users[]\ntrigger IPostUserStatusChanged per user
    Omnichannel->>Omnichannel: iterate users[]\nif role in [livechat-agent, livechat-manager, livechat-monitor]\nnotifyAgentStatusChanged
  end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~30 minutes

Possibly related PRs

Suggested labels

stat: ready to merge, stat: QA assured

Suggested reviewers

  • cardoso
  • tassoevan

Poem

🐰 I hopped through pings and buffered the race,
Collected the scurry, then slowed down the pace.
Five hundred millis, one tidy array —
A calm little batch to save the day. ✨

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Linked Issues check ✅ Passed The PR implements batching of presence updates with 500ms debounce to prevent broadcast storms, reducing O(N²) work during mass reconnections, directly addressing all coding objectives from issue #21182.
Out of Scope Changes check ✅ Passed All changes focus on implementing presence update batching and supporting event handling; no unrelated modifications detected outside the stated objectives.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Title check ✅ Passed The title directly addresses the main change: implementing batching for presence updates to prevent broadcast storms, which is the core focus of the PR.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@ee/packages/presence/src/Presence.ts`:
- Around line 311-328: In stopped(), ensure any pending batch timeout is
cancelled to avoid firing after teardown: if this.batchTimeout exists call
clearTimeout(this.batchTimeout) and set this.batchTimeout = undefined; also
consider clearing this.presenceBatch (this.presenceBatch.clear()) or flushing it
safely before returning so no pending broadcast via
this.api?.broadcast('presence.status.batch', ...) runs after stop; update the
stopped() method to perform these actions.
- Around line 25-31: The presenceBatch map's user projection is missing the
`name` field which causes batched payloads to lack `name`; update the type for
presenceBatch to include `name` in the Pick (e.g., Pick<IUser, '_id' |
'username' | 'status' | 'statusText' | 'name' | 'roles'>), then update any DB
query projections used by setStatus, updateUserPresence and wherever users are
loaded so they include `name`, and change the broadcast method signature to
accept and forward `name` to match Events.ts presence.status.batch and the
listeners.module.ts handler that destructures `name`. Ensure all usages and
types referencing presenceBatch, setStatus, updateUserPresence, and broadcast
are updated accordingly.
🧹 Nitpick comments (3)
apps/meteor/server/services/apps-engine/service.ts (1)

33-41: Consider adding error handling for individual batch items.

If Apps.self?.triggerEvent throws for one user in the batch, subsequent users won't be processed. Consider wrapping each iteration in a try-catch or using Promise.allSettled if order independence is acceptable.

Proposed fix with error resilience
 		this.onEvent('presence.status.batch', async (batch): Promise<void> => {
 			for (const { user, previousStatus } of batch) {
+				try {
 				await Apps.self?.triggerEvent(AppEvents.IPostUserStatusChanged, {
 					user,
 					currentStatus: user.status,
 					previousStatus,
 				});
+				} catch (error) {
+					// Log error but continue processing remaining batch items
+					SystemLogger.error({ msg: 'Failed to trigger IPostUserStatusChanged for user', userId: user._id, error });
+				}
 			}
 		});
apps/meteor/server/services/omnichannel/service.ts (1)

34-45: LGTM with a suggestion to reduce duplication.

The batch handler correctly mirrors the single-event logic. Consider extracting the common validation and notification logic into a helper to avoid duplication:

Optional: Extract helper to reduce duplication
+	private async handlePresenceUser(user: { _id?: string; status?: UserStatus; roles?: string[] }): Promise<void> {
+		if (!user?._id) {
+			return;
+		}
+		const hasRole = user.roles?.some((role) => ['livechat-manager', 'livechat-monitor', 'livechat-agent'].includes(role));
+		if (hasRole) {
+			await notifyAgentStatusChanged(user._id, user.status);
+		}
+	}
+
 	override async created() {
 		this.onEvent('presence.status', async ({ user }): Promise<void> => {
-			if (!user?._id) {
-				return;
-			}
-			const hasRole = user.roles.some((role) => ['livechat-manager', 'livechat-monitor', 'livechat-agent'].includes(role));
-			if (hasRole) {
-				// TODO change `Livechat.notifyAgentStatusChanged` to a service call
-				await notifyAgentStatusChanged(user._id, user.status);
-			}
+			await this.handlePresenceUser(user);
 		});

 		this.onEvent('presence.status.batch', async (batch): Promise<void> => {
 			for (const { user } of batch) {
-				if (!user?._id) {
-					continue;
-				}
-				const hasRole = user.roles.some((role) => ['livechat-manager', 'livechat-monitor', 'livechat-agent'].includes(role));
-				if (hasRole) {
-					// TODO change `Livechat.notifyAgentStatusChanged` to a service call
-					await notifyAgentStatusChanged(user._id, user.status);
-				}
+				await this.handlePresenceUser(user);
 			}
 		});
 	}
ee/packages/presence/src/Presence.spec.ts (1)

29-101: Good test coverage for core batching scenarios.

The tests effectively validate buffering, batching multiple users, and debouncing. Consider adding a test to verify that no broadcast occurs when broadcastEnabled is false:

Optional: Additional test for disabled broadcast
it('should not broadcast when broadcastEnabled is false', () => {
    (presence as any).broadcastEnabled = false;
    const user = { _id: 'u1', username: 'user1', status: 'online' } as any;
    (presence as any).broadcast(user, 'offline');

    expect((presence as any).presenceBatch.size).toBe(0);
    
    jest.advanceTimersByTime(500);
    
    expect((presence as any).api.broadcast).not.toHaveBeenCalled();
});

Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 issues found across 6 files

Prompt for AI agents (all issues)

Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="ee/packages/presence/src/Presence.ts">

<violation number="1" location="ee/packages/presence/src/Presence.ts:318">
P2: New batch timeout isn’t cleared on service stop, allowing pending broadcasts after shutdown</violation>

<violation number="2" location="ee/packages/presence/src/Presence.ts:327">
P2: Pending presence batch is broadcast even if broadcasting is disabled before flush</violation>
</file>

Since this is your first cubic review, here's how it works:

  • cubic automatically reviews your code and comments on bugs and improvements
  • Teach cubic by replying to its comments. cubic learns from your replies and gets better over time
  • Ask questions if you need clarification on any suggestion

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

shash-hq and others added 2 commits January 18, 2026 04:21
Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com>
@shash-hq shash-hq changed the title fix(presence): batch presence updates to prevent broadcast storm #21182 fix(presence): batch presence updates to prevent broadcast storm #21182 stat: ready to merge Jan 17, 2026
@shash-hq shash-hq changed the title fix(presence): batch presence updates to prevent broadcast storm #21182 stat: ready to merge fix(presence): batch presence updates to prevent broadcast storm #21182 Jan 17, 2026
@shash-hq
Copy link
Author

I have addressed the feedback and the PR is ready for review. Could a maintainer please add the stat: QA assured label and set the milestone to 8.2.0?
Thank you!

@geekgonecrazy

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Rocket.Chat stability issue when many users disconnected and connected again

1 participant