feat: collect federation outgoing metrics#38459
feat: collect federation outgoing metrics#38459dhulke wants to merge 13 commits intofeat/federation-tracing2from
Conversation
|
Looks like this PR is not ready to merge, because of the following issues:
Please fix the issues and try again If you have any trouble, please check the PR guidelines |
|
WalkthroughThis PR adds comprehensive Prometheus metrics instrumentation to the federation-matrix package. It introduces a new Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## feat/federation-tracing2 #38459 +/- ##
============================================================
- Coverage 70.41% 70.41% -0.01%
============================================================
Files 3161 3161
Lines 110151 110151
Branches 19862 19891 +29
============================================================
- Hits 77565 77562 -3
- Misses 30557 30560 +3
Partials 2029 2029
Flags with carried forward coverage won't be shown. Click here to find out more. 🚀 New features to boost your workflow:
|
…, messages, and events
…y in yarn.lock; modify direct message room creation method in FederationMatrix
…s for room creation, message sending, and user invitations
…ic tracking in FederationMatrix callbacks
… to streamline code
…ic tracking in FederationMatrix
…e metrics collection
…reamline metrics collection
…g type signatures and improving error handling
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
ee/packages/federation-matrix/src/FederationMatrix.ts (1)
515-548:⚠️ Potential issue | 🟡 MinorMetric counter increments for skipped invites.
federatedInvitesSentis incremented by total usernames even when invites are skipped—specifically when the inviter is native federated (line 533-535), no invite is sent but the counter still increments at line 547. Move the counter increment into the callback to only count actual invites sent.Suggested fix
await Promise.all( matrixUsersUsername.map(async (username) => { if (validateFederatedUsername(username)) { - return federationSDK.inviteUserToRoom( + await federationSDK.inviteUserToRoom( userIdSchema.parse(username), roomIdSchema.parse(room.federation.mrid), userIdSchema.parse(inviterUserId), ); + federationMetrics.federatedInvitesSent.inc({ room_type: room.t }); + return; } // if inviter is an external user it means we receive the invite from the endpoint // since we accept from there we can skip accepting here if (isUserNativeFederated(inviter)) { this.logger.debug('Inviter is native federated, skip accept invite'); return; } - return federationSDK.inviteUserToRoom( + await federationSDK.inviteUserToRoom( userIdSchema.parse(`@${username}:${this.serverName}`), roomIdSchema.parse(room.federation.mrid), userIdSchema.parse(inviterUserId), ); + federationMetrics.federatedInvitesSent.inc({ room_type: room.t }); }), ); -// Increment invite counter for each user invited -federationMetrics.federatedInvitesSent.inc({ room_type: room.t }, matrixUsersUsername.length); federationMetrics.federationEventsProcessed.inc({ event_type: 'membership', direction: 'outgoing' });
🤖 Fix all issues with AI agents
In `@ee/packages/federation-matrix/src/FederationMatrix.ts`:
- Around line 1049-1104: In handleInvite, remove the explicit cleanup call to
Room.performUserRemoval in the catch branch for federationSDK.rejectInvite
(inside the action === 'reject' block) and instead log the 403 error and return
so the leave-event flow (handleLeave) performs the removal; keep the existing
this.logger.error line and ensure the function returns early on 403 without
invoking Room.performUserRemoval.
🧹 Nitpick comments (3)
ee/packages/federation-matrix/src/FederationMatrix.ts (1)
122-165: Consider removing the inline “Increment success metrics” comment.The instrumentation is clear without the extra comment, and it repeats a pattern seen elsewhere.
As per coding guidelines: Avoid code comments in the implementation.
ee/packages/federation-matrix/src/events/member.ts (1)
94-99: Drop the inline spec comment.The code is self-explanatory here; consider removing the comment per the no-comments guideline.
As per coding guidelines: Avoid code comments in the implementation.
ee/packages/federation-matrix/src/events/message.ts (1)
147-156: Remove inline SPEC comments from implementation.
Consider relocating these notes to external docs/spec references to keep runtime code clean.As per coding guidelines: Avoid code comments in the implementation.Proposed diff
@@ - // SPEC: For example, an m.thread relationship type denotes that the event is part of a "thread" of messages and should be rendered as such. const hasRelation = relation && 'rel_type' in relation; @@ - // SPEC: Though rich replies form a relationship to another event, they do not use rel_type to create this relationship. - // Instead, a subkey named m.in_reply_to is used to describe the reply's relationship, const isRichReply = relation && !('rel_type' in relation) && 'm.in_reply_to' in relation; @@ - // SPEC: For example, an m.thread relationship type denotes that the event is part of a "thread" of messages and should be rendered as such. const hasRelation = relation && 'rel_type' in relation; @@ - // SPEC: Though rich replies form a relationship to another event, they do not use rel_type to create this relationship. - // Instead, a subkey named m.in_reply_to is used to describe the reply's relationship, const isRichReply = relation && !('rel_type' in relation) && 'm.in_reply_to' in relation;Also applies to: 297-306
| async handleInvite(roomId: IRoom['_id'], userId: IUser['_id'], action: 'accept' | 'reject'): Promise<void> { | ||
| const subscription = await Subscriptions.findInvitedSubscription(roomId, userId); | ||
| if (!subscription) { | ||
| throw new Error('No subscription found or user does not have permission to accept or reject this invite'); | ||
| } | ||
| try { | ||
| const subscription = await Subscriptions.findInvitedSubscription(roomId, userId); | ||
| if (!subscription) { | ||
| throw new Error('No subscription found or user does not have permission to accept or reject this invite'); | ||
| } | ||
|
|
||
| const room = await Rooms.findOneById(roomId); | ||
| if (!room || !isRoomNativeFederated(room)) { | ||
| throw new Error('Room not found or not federated'); | ||
| } | ||
| const room = await Rooms.findOneById(roomId); | ||
| if (!room || !isRoomNativeFederated(room)) { | ||
| throw new Error('Room not found or not federated'); | ||
| } | ||
|
|
||
| const user = await Users.findOneById(userId); | ||
| if (!user) { | ||
| throw new Error('User not found'); | ||
| } | ||
| const user = await Users.findOneById(userId); | ||
| if (!user) { | ||
| throw new Error('User not found'); | ||
| } | ||
|
|
||
| if (!user.username) { | ||
| throw new Error('User username not found'); | ||
| } | ||
| if (!user.username) { | ||
| throw new Error('User username not found'); | ||
| } | ||
|
|
||
| // TODO: should use common function to get matrix user ID | ||
| const matrixUserId = isUserNativeFederated(user) ? user.federation.mui : `@${user.username}:${this.serverName}`; | ||
| // TODO: should use common function to get matrix user ID | ||
| const matrixUserId = isUserNativeFederated(user) ? user.federation.mui : `@${user.username}:${this.serverName}`; | ||
|
|
||
| // Add runtime attributes after querying room and user | ||
| addSpanAttributes({ | ||
| matrixRoomId: room.federation.mrid, | ||
| matrixUserId, | ||
| username: user.username, | ||
| isNativeFederatedUser: isUserNativeFederated(user), | ||
| }); | ||
| // Add runtime attributes after querying room and user | ||
| addSpanAttributes({ | ||
| matrixRoomId: room.federation.mrid, | ||
| matrixUserId, | ||
| username: user.username, | ||
| isNativeFederatedUser: isUserNativeFederated(user), | ||
| }); | ||
|
|
||
| if (action === 'accept') { | ||
| await federationSDK.acceptInvite(room.federation.mrid, matrixUserId); | ||
| } | ||
| if (action === 'accept') { | ||
| await federationSDK.acceptInvite(room.federation.mrid, matrixUserId); | ||
| } | ||
|
|
||
| if (action === 'reject') { | ||
| try { | ||
| await federationSDK.rejectInvite(room.federation.mrid, matrixUserId); | ||
| } catch (err) { | ||
| if (err instanceof FederationRequestError && err.response.status === 403) { | ||
| return Room.performUserRemoval(room, user); | ||
| if (action === 'reject') { | ||
| try { | ||
| await federationSDK.rejectInvite(room.federation.mrid, matrixUserId); | ||
| } catch (err) { | ||
| if (err instanceof FederationRequestError && err.response.status === 403) { | ||
| return Room.performUserRemoval(room, user); | ||
| } | ||
| this.logger.error({ msg: 'Failed to reject invite in Matrix', err }); | ||
| throw err; | ||
| } | ||
| this.logger.error({ msg: 'Failed to reject invite in Matrix', err }); | ||
| throw err; | ||
| } | ||
|
|
||
| federationMetrics.federationEventsProcessed.inc({ event_type: 'membership', direction: 'outgoing' }); | ||
| } catch (err) { | ||
| federationMetrics.federationEventsFailed.inc({ | ||
| event_type: 'membership', | ||
| direction: 'outgoing', | ||
| error_type: err instanceof Error ? err.constructor.name : 'Unknown', | ||
| }); | ||
| throw err; |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# Check the actual current state of handleInvite method
rg -A 60 "async handleInvite" ee/packages/federation-matrix/src/FederationMatrix.tsRepository: RocketChat/Rocket.Chat
Length of output: 1986
🏁 Script executed:
# Verify handleLeave method exists and performs cleanup
rg -A 20 "handleLeave" ee/packages/federation-matrix/src/events/member.tsRepository: RocketChat/Rocket.Chat
Length of output: 965
🏁 Script executed:
# Check other usages of rejectInvite and how they handle errors
rg -B 5 -A 10 "rejectInvite" ee/packages/federation-matrix/src/Repository: RocketChat/Rocket.Chat
Length of output: 1490
🏁 Script executed:
# Verify Room.performUserRemoval is called in handleLeave
rg -n "performUserRemoval" ee/packages/federation-matrix/Repository: RocketChat/Rocket.Chat
Length of output: 280
🏁 Script executed:
# Look for context around the 403 handling and what it means
rg -B 10 -A 5 "403" ee/packages/federation-matrix/src/FederationMatrix.ts | head -50Repository: RocketChat/Rocket.Chat
Length of output: 553
🏁 Script executed:
# Check Matrix federation SDK documentation or comments about rejectInvite behavior
rg -B 5 -A 5 "rejectInvite" --type=ts | grep -A 5 -B 5 "403\|reject\|leave"Repository: RocketChat/Rocket.Chat
Length of output: 7003
🏁 Script executed:
# Look for event subscriptions in federation-matrix to understand what events trigger cleanup
rg "on.*leave|on.*membership|subscribe.*leave" ee/packages/federation-matrix/src/ --type=tsRepository: RocketChat/Rocket.Chat
Length of output: 1721
🏁 Script executed:
# Check if there are tests that show the 403 behavior
fd "test|spec" ee/packages/federation-matrix/src/ --type=f -name "*.ts" | head -5Repository: RocketChat/Rocket.Chat
Length of output: 296
🏁 Script executed:
# Look for more context on 403 handling and Matrix federation behavior
rg -B 15 "403" ee/packages/federation-matrix/src/FederationMatrix.tsRepository: RocketChat/Rocket.Chat
Length of output: 565
🏁 Script executed:
# Check if there are comments or documentation about the 403 case in rejectInvite
rg -B 10 -A 10 "status === 403" ee/packages/federation-matrix/src/Repository: RocketChat/Rocket.Chat
Length of output: 1866
🏁 Script executed:
# Look at the subscription model to understand when subscriptions are cleaned up
rg -A 5 "findInvitedSubscription" ee/packages/federation-matrix/src/ --type=tsRepository: RocketChat/Rocket.Chat
Length of output: 664
🏁 Script executed:
# Check if there are event listeners or handlers that might be triggered by 403
rg "on.*event|once|subscribe" ee/packages/federation-matrix/src/events/member.ts --type=ts -A 3Repository: RocketChat/Rocket.Chat
Length of output: 651
🏁 Script executed:
# Look for any documentation or comments about the 403 case in the federation SDK or Matrix files
rg -B 5 -A 5 "403\|Forbidden" ee/packages/federation-matrix/ --type=tsRepository: RocketChat/Rocket.Chat
Length of output: 48
🏁 Script executed:
# Check if there's any documentation on what the 403 status means in the context of rejectInvite
rg "rejectInvite" --type=ts -B 10 -A 10 | grep -A 10 -B 10 "Forbidden\|403\|already"Repository: RocketChat/Rocket.Chat
Length of output: 1739
🏁 Script executed:
# Look for tests or mocks of the handleInvite or rejectInvite scenario
fd "handleInvite\|reject" ee/packages/federation-matrix/src/ -type f -name "*.spec.ts"Repository: RocketChat/Rocket.Chat
Length of output: 236
🏁 Script executed:
# Check git history or comments for context on why 403 handling was added
rg "performUserRemoval.*403\|403.*performUserRemoval" ee/packages/federation-matrix/ -B 10 -A 5Repository: RocketChat/Rocket.Chat
Length of output: 48
Avoid explicit cleanup on invite reject; rely on leave-event flow.
Calling Room.performUserRemoval in the 403 branch violates the established event-driven cleanup pattern. The leave-event handler in handleLeave() is the single cleanup point and should be trusted to handle all user removals. Log the error and return instead.
Suggested fix
if (action === 'reject') {
try {
await federationSDK.rejectInvite(room.federation.mrid, matrixUserId);
} catch (err) {
if (err instanceof FederationRequestError && err.response.status === 403) {
- return Room.performUserRemoval(room, user);
+ this.logger.debug({ msg: 'Invite already rejected remotely; awaiting leave-event cleanup', roomId, userId });
+ return;
}
this.logger.error({ msg: 'Failed to reject invite in Matrix', err });
throw err;
}
}🤖 Prompt for AI Agents
In `@ee/packages/federation-matrix/src/FederationMatrix.ts` around lines 1049 -
1104, In handleInvite, remove the explicit cleanup call to
Room.performUserRemoval in the catch branch for federationSDK.rejectInvite
(inside the action === 'reject' block) and instead log the 403 error and return
so the leave-event flow (handleLeave) performs the removal; keep the existing
this.logger.error line and ensure the function returns early on 403 without
invoking Room.performUserRemoval.
This PR depends on the new homeserver sdk version in this PR: RocketChat/homeserver#331. This new homeserver sdk version implements error handlers in the event subscription.
Proposed changes (including videos or screenshots)
Issue(s)
Steps to test or reproduce
Further comments
Summary by CodeRabbit
New Features
Chores