-
Notifications
You must be signed in to change notification settings - Fork 13k
regression: call ending on non-critical errors #37070
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
regression: call ending on non-critical errors #37070
Conversation
|
Looks like this PR is not ready to merge, because of the following issues:
Please fix the issues and try again If you have any trouble, please check the PR guidelines |
|
WalkthroughConsolidates media-call error handling around a richer ClientMediaSignalError payload (adds critical and errorDetails), updates transport and emitters to propagate these fields, introduces refined hangup reasoning (including input-error), and records new call timestamps (acceptedAt, activatedAt). Removes legacy per-type error handlers and adds a serializeError utility. Minor schema and typing updates. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Client
participant Transport as TransportWrapper
participant Server as CallSignalProcessor
participant Director as MediaCallDirector
rect rgb(245,248,255)
note over Client: Error occurs (init/offer/answer/conn/ICE)
Client->>Transport: sendError({ type, errorCode, negotiationId, critical, errorDetails })
Transport->>Server: ClientMediaSignalError (critical default false if not provided)
end
rect rgb(245,255,245)
Server->>Server: processSignal(signal)
Server->>Server: processError(signal)<br/>(log, store)
alt errorType == "service" and past negotiation
note right of Server: Do not hang up
else alt critical == false
note right of Server: Do not hang up
else
Server->>Director: hangup(reason derived from errorType)
end
end
sequenceDiagram
autonumber
participant WebRTC as WebRTC Processor
participant Call as Call
participant Transport as TransportWrapper
participant Server as CallSignalProcessor
WebRTC-->>Call: internalError(event, details?)
Call->>Transport: sendError({ type:"service", errorCode, negotiationId?, critical:true, errorDetails })
Transport->>Server: ClientMediaSignalError
Server->>Server: processError(...) -> optional hangup
Estimated code review effort🎯 4 (Complex) | ⏱️ ~55 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## release-7.11.0 #37070 +/- ##
==================================================
- Coverage 67.43% 67.41% -0.02%
==================================================
Files 3328 3328
Lines 113342 113341 -1
Branches 20571 20559 -12
==================================================
- Hits 76428 76406 -22
- Misses 34315 34333 +18
- Partials 2599 2602 +3
Flags with carried forward coverage won't be shown. Click here to find out more. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (5)
packages/media-signaling/src/lib/services/webrtc/Processor.ts (1)
328-329: UseserializeErrorutility instead ofJSON.stringifyfor better error diagnostics.DOM event objects like
RTCPeerConnectionIceErrorEventhave non-enumerable properties (errorCode,errorText,url, etc.) thatJSON.stringifycannot capture, likely resulting inerrorDetails: "{}". The codebase provides aserializeErrorutility atpackages/media-signaling/src/lib/utils/serializeError.tsthat handles this by spreading the object and explicitly extracting key properties.+ import { serializeError } from '../../utils/serializeError'; - this.emitter.emit('internalError', { critical: false, error: 'ice-candidate-error', errorDetails: JSON.stringify(event) }); + this.emitter.emit('internalError', { critical: false, error: 'ice-candidate-error', errorDetails: serializeError(event) });This will ensure diagnostic properties like
errorCode(STUN/TURN error code) anderrorText(reason phrase) are captured for debugging ICE connection issues.packages/models/src/models/MediaCallChannels.ts (1)
73-82: Consider error array management strategy.The implementation uses
$addToSetwhich deduplicates errors. A few considerations:
- Deduplication behavior: Two errors with identical fields won't both be stored. Is this intentional? Repeated identical errors might be meaningful for debugging.
- Array growth: No bounds on the
errorsarray size. In high-error scenarios (e.g., the ICE candidate issue mentioned in PR objectives), this could grow unbounded.Consider whether
$push(allows duplicates) is more appropriate, and whether a TTL or size limit should be enforced.packages/media-signaling/src/lib/TransportWrapper.ts (1)
29-37: Simplify the critical field default.The changes correctly extend error reporting with critical and errorDetails fields. However, line 34 can be simplified:
- ...(critical ? { critical } : { critical: false }), + critical: critical ?? false,This is clearer and achieves the same result: critical defaults to false when undefined.
packages/media-signaling/src/lib/utils/serializeError.ts (1)
12-18: Consider including thestackproperty for Error instances.The spread operator
{...error}only copies enumerable properties, and standard Error properties (name,message,stack) are non-enumerable. While you explicitly addnameandmessage, thestacktrace is valuable for debugging and should also be included.Apply this diff:
if (error instanceof Error) { return JSON.stringify({ ...error, name: error.name, message: error.message, + stack: error.stack, }); }ee/packages/media-calls/src/internal/agents/CallSignalProcessor.ts (1)
163-183: Consider reordering logic for clarity.The hangup decision flow is correct but could be more readable. Currently, the
criticalcheck (lines 174-176) comes after thesignalingerror type is potentially encountered, which is confusing since non-critical errors should never reach the signaling-error assignment.Consider this reordering for improved clarity:
let hangupReason: CallHangupReason = 'error'; + + // Non-critical errors never trigger hangup + if (!critical) { + return; + } + if (errorType === 'service') { hangupReason = 'service-error'; // Do not hangup on service errors after the call is already active; // if the error happened on a renegotiation, then the service may still be able to rollback to a valid state if (this.isPastNegotiation()) { return; } } - if (!critical) { - return; - } - if (errorType === 'signaling') { hangupReason = 'signaling-error'; } await mediaCallDirector.hangup(this.call, this.agent, hangupReason);This makes it immediately clear that only critical errors can trigger hangups, and service errors during renegotiation are always exempt.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Jira integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (14)
ee/packages/media-calls/src/internal/agents/CallSignalProcessor.ts(2 hunks)packages/core-typings/src/mediaCalls/IMediaCall.ts(1 hunks)packages/core-typings/src/mediaCalls/IMediaCallChannel.ts(2 hunks)packages/media-signaling/src/definition/call/IClientMediaCall.ts(1 hunks)packages/media-signaling/src/definition/services/IServiceProcessor.ts(1 hunks)packages/media-signaling/src/definition/signals/client/error.ts(2 hunks)packages/media-signaling/src/lib/Call.ts(9 hunks)packages/media-signaling/src/lib/Session.ts(1 hunks)packages/media-signaling/src/lib/TransportWrapper.ts(1 hunks)packages/media-signaling/src/lib/services/webrtc/Processor.ts(1 hunks)packages/media-signaling/src/lib/utils/serializeError.ts(1 hunks)packages/model-typings/src/models/IMediaCallChannelsModel.ts(2 hunks)packages/models/src/models/MediaCallChannels.ts(2 hunks)packages/models/src/models/MediaCalls.ts(2 hunks)
🧰 Additional context used
🧬 Code graph analysis (5)
packages/model-typings/src/models/IMediaCallChannelsModel.ts (1)
packages/core-typings/src/mediaCalls/IMediaCallChannel.ts (1)
MediaCallChannelError(4-10)
packages/models/src/models/MediaCallChannels.ts (1)
packages/core-typings/src/mediaCalls/IMediaCallChannel.ts (1)
MediaCallChannelError(4-10)
ee/packages/media-calls/src/internal/agents/CallSignalProcessor.ts (4)
packages/media-signaling/src/definition/signals/client/error.ts (1)
ClientMediaSignalError(4-14)packages/models/src/index.ts (1)
MediaCallChannels(185-185)packages/media-signaling/src/definition/call/IClientMediaCall.ts (1)
CallHangupReason(29-41)ee/packages/media-calls/src/server/CallDirector.ts (1)
mediaCallDirector(416-416)
packages/media-signaling/src/lib/TransportWrapper.ts (1)
packages/media-signaling/src/definition/signals/client/error.ts (1)
ClientMediaSignalError(4-14)
packages/media-signaling/src/lib/Call.ts (1)
packages/media-signaling/src/lib/utils/serializeError.ts (1)
serializeError(1-37)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: 🚢 Build Docker Images for Testing (alpine)
🔇 Additional comments (25)
packages/media-signaling/src/definition/services/IServiceProcessor.ts (1)
16-16: LGTM! Enhanced error payload supports richer diagnostics.The addition of the optional
errorDetailsfield enables processors to attach serialized error context (e.g., ICE candidate error events) to internal error emissions, supporting the PR's goal of capturing detailed error information for non-critical errors without breaking existing consumers.packages/media-signaling/src/lib/Session.ts (1)
429-429: LGTM! More specific hangup reason improves error categorization.Changing the reason from
"service-error"to"input-error"provides clearer semantics for audio input failures, making it easier to distinguish user media permission/device issues from other service-level errors in logs and analytics.packages/models/src/models/MediaCalls.ts (2)
85-85: LGTM! Timestamp captures call acceptance time.Recording
acceptedAtwhen a call transitions to theacceptedstate provides valuable audit information for debugging call flow issues and measuring acceptance latency.
101-101: LGTM! Timestamp captures call activation time.Recording
activatedAtwhen a call transitions to theactivestate enables tracking of the time between acceptance and activation, useful for diagnosing setup delays and connection issues.packages/model-typings/src/models/IMediaCallChannelsModel.ts (2)
1-1: LGTM! Import supports new error tracking capability.The
MediaCallChannelErrortype import enables the model interface to support the newaddErrorByIdmethod, aligning with the PR's goal to store detailed error information on call channels.
14-14: LGTM! New method enables error persistence on channels.The
addErrorByIdmethod provides a clean API for appending error records (with timestamp, error type, code, details, and negotiation context) to media call channels, supporting richer error diagnostics and post-mortem analysis.packages/media-signaling/src/definition/call/IClientMediaCall.ts (1)
39-39: LGTM! Well-documented addition.The new 'input-error' hangup reason is clearly documented and aligns with the PR objective to distinguish audio input issues from general service errors.
packages/models/src/models/MediaCallChannels.ts (1)
1-1: LGTM!Import correctly adds the new MediaCallChannelError type needed by the addErrorById method.
packages/core-typings/src/mediaCalls/IMediaCall.ts (1)
54-57: LGTM! Clear timestamp additions.The new timestamp fields are well-documented and align with the PR objective to record state transition times. The optional nature ensures backward compatibility.
packages/core-typings/src/mediaCalls/IMediaCallChannel.ts (2)
4-10: LGTM! Well-structured error type.The MediaCallChannelError type provides a clear structure for error tracking with required timestamp and type fields, plus optional contextual information.
32-32: LGTM! Consistent error storage.The optional errors array integrates cleanly with the existing interface and supports the PR's goal of storing detailed client-reported errors.
packages/media-signaling/src/lib/utils/serializeError.ts (1)
20-31: LGTM - Plain object handling is sound.The logic correctly handles plain objects by copying enumerable properties and ensuring
nameandmessagefields are included when present. The outer try/catch appropriately handles potentialJSON.stringifyfailures (e.g., circular references).packages/media-signaling/src/definition/signals/client/error.ts (1)
12-13: LGTM - Schema changes are consistent and correct.The new optional fields
criticalanderrorDetailsare properly typed and the JSON schema validation correctly mirrors the TypeScript type definitions with nullable properties.Also applies to: 46-53
ee/packages/media-calls/src/internal/agents/CallSignalProcessor.ts (3)
99-99: LGTM - Improved error signal handling.Passing the entire signal object to
processErroris cleaner and more maintainable than passing individual fields, especially with the new optional fields added.
155-161: Verify silent failure handling for error storage.Error storage failures are caught and silently ignored with
.catch(() => null). While this prevents the error processing flow from breaking, consider whether error storage failures should be logged for observability.Is silent failure of error storage acceptable, or should these failures be logged for monitoring?
287-289: LGTM - Helper method is clear and correct.The
isPastNegotiationhelper appropriately identifies calls that have completed negotiation, which is used to determine whether service errors should trigger hangup.packages/media-signaling/src/lib/Call.ts (9)
19-19: LGTM!The import of
serializeErroris appropriate for enriching error reporting throughout this file.
279-287: LGTM!Correctly marks initialization failures as critical and captures error details with
serializeError. The flow properly rejects the call as unavailable before re-throwing.
735-735: LGTM!Correctly marks invalid-service error as critical with negotiationId context.
752-760: LGTM!Proper critical error handling with serialized error details and negotiationId. The exception is correctly re-thrown after reporting.
823-826: LGTM!Correctly handles the missing answer case with a critical error and proper control flow termination.
952-954: LGTM!Correctly marks signaling errors as critical when a call is activated without local acceptance.
1055-1074: LGTM!The enhanced method signature properly supports optional error details and critical flag propagation. The implementation correctly includes errorDetails only when provided and properly spreads the negotiationId conditionally.
1100-1123: LGTM!Connection state failures ('failed' and 'closed') correctly emit critical service errors with negotiation context before triggering hangup. The error handling properly prevents duplicate hangups when the call is already over.
1186-1193: LGTM!Proper critical error handling when WebRTC is not implemented, with serialized error details and correct exception propagation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
packages/media-signaling/src/lib/Call.ts (1)
953-953: Consider using the sendError wrapper for consistency.Line 953 calls
this.config.transporter.sendErrordirectly instead of using thethis.sendErrorwrapper method (defined at lines 833-841). The wrapper provides consistent logging and hidden-state checking.Apply this diff for consistency:
- this.config.transporter.sendError(this.callId, { errorType: 'signaling', errorCode: 'not-accepted', critical: true }); + this.sendError({ errorType: 'signaling', errorCode: 'not-accepted', critical: true });
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Jira integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
packages/media-signaling/src/lib/Call.ts(9 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
packages/media-signaling/src/lib/Call.ts (1)
packages/media-signaling/src/lib/utils/serializeError.ts (1)
serializeError(1-37)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: 📦 Build Packages
- GitHub Check: CodeQL-Build
- GitHub Check: CodeQL-Build
🔇 Additional comments (10)
packages/media-signaling/src/lib/Call.ts (10)
19-19: LGTM!The import of
serializeErroris correctly placed and will be used consistently throughout the file for structured error serialization.
279-284: LGTM!The error handling for service initialization failure is appropriate. The critical flag is correctly set, error details are properly serialized, and the control flow (reject and re-throw) ensures the call cannot proceed with a broken state.
735-735: LGTM!The error handling for invalid service is appropriate, with the critical flag set and control flow returning immediately.
752-758: LGTM!The error handling for offer creation failure is appropriate. The critical flag is set, error details are properly serialized using
serializeError, and the exception is re-thrown to halt execution.
814-820: LGTM!The error handling for answer creation failure is correct. The use of
serializeError(e)at line 819 properly addresses the previous review comment about consistent error serialization.
825-826: LGTM!The error handling when no answer is produced correctly sends a critical error and returns to prevent further execution.
1056-1074: LGTM!The enhanced signature for
onWebRTCInternalErrorproperly supports optionalerrorDetails, and the error sending logic correctly includes all fields conditionally. This enables richer error reporting from WebRTC internals.
1103-1120: LGTM!The error handling for WebRTC connection state changes (
failedandclosed) correctly sends critical errors with negotiation context before hanging up. The critical flag is appropriate as these states represent unrecoverable connection failures.
1191-1191: LGTM!The error handling for WebRTC requirement failure correctly uses
serializeErrorfor error details, sets the critical flag, and re-throws to prevent further execution with an unsupported configuration.
763-764: Add return statement after critical error.After sending the critical error when
offeris null (line 763), the code continues to execute and callsdeliverSdpat line 767 with a null offer, which will cause issues downstream.Apply this diff to halt execution:
if (!offer) { this.sendError({ errorType: 'service', errorCode: 'implementation-error', negotiationId, critical: true }); + return; }
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (2)
packages/core-typings/src/mediaCalls/IMediaCallChannel.ts (1)
4-9: Consider adding errorDetails field for better debugging.The
ClientMediaSignalErrorsignal includes anerrorDetailsfield (seepackages/media-signaling/src/definition/signals/client/error.ts), which is logged inCallSignalProcessor.processError(line 149) but not persisted to the database. Storing this field would preserve diagnostic information that could be valuable when investigating call failures in MongoDB.Apply this diff to add the optional field:
export type MediaCallChannelError = { ts: Date; errorType: string; errorCode?: string; negotiationId?: string; + errorDetails?: string; };Then update the persistence logic in
CallSignalProcessor.ts(line 156-161) to include it:void MediaCallChannels.addErrorById(this.channel._id, { errorType, ts: new Date(), ...(errorCode && { errorCode }), ...(negotiationId && { negotiationId }), + ...(errorDetails && { errorDetails }), }).catch(() => null);ee/packages/media-calls/src/internal/agents/CallSignalProcessor.ts (1)
155-161: Consider logging when error persistence fails.The
.catch(() => null)silently swallows any failures when storing errors to the database. While this prevents blocking the error handler, it means storage failures go unnoticed, potentially losing valuable diagnostic information.Apply this diff to log persistence failures:
void MediaCallChannels.addErrorById(this.channel._id, { errorType, ts: new Date(), ...(errorCode && { errorCode }), ...(negotiationId && { negotiationId }), -}).catch(() => null); +}).catch((err) => { + logger.warn({ msg: 'Failed to persist error to channel', channelId: this.channel._id, err }); +});
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Jira integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (2)
ee/packages/media-calls/src/internal/agents/CallSignalProcessor.ts(2 hunks)packages/core-typings/src/mediaCalls/IMediaCallChannel.ts(2 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
ee/packages/media-calls/src/internal/agents/CallSignalProcessor.ts (4)
packages/media-signaling/src/definition/signals/client/error.ts (1)
ClientMediaSignalError(4-14)packages/models/src/index.ts (1)
MediaCallChannels(185-185)packages/media-signaling/src/definition/call/IClientMediaCall.ts (1)
CallHangupReason(29-41)ee/packages/media-calls/src/server/CallDirector.ts (1)
mediaCallDirector(416-416)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: 📦 Build Packages
- GitHub Check: Builds matrix rust bindings against alpine
- GitHub Check: CodeQL-Build
- GitHub Check: CodeQL-Build
🔇 Additional comments (5)
packages/core-typings/src/mediaCalls/IMediaCallChannel.ts (1)
31-31: LGTM!The optional
errorsarray integrates cleanly with the existing interface structure and enables tracking multiple error events per channel.ee/packages/media-calls/src/internal/agents/CallSignalProcessor.ts (4)
99-99: LGTM!Passing the complete signal object simplifies the call site and enables
processErrorto access all error context fields.
138-143: LGTM!The default values (
errorType = 'other'andcritical = false) provide safe fallback behavior when clients send incomplete error signals.
144-153: LGTM!Comprehensive error logging with all relevant context will aid in debugging call failures.
163-182: LGTM: Critical-flag gating prevents unnecessary hangups.The refactored logic correctly implements the core requirement: only critical errors trigger call hangup. The decision tree properly handles:
- Non-critical errors: early return (no hangup)
- Service errors during active calls: early return (allows rollback on renegotiation)
- Critical signaling/service/other errors: hangup with appropriate reason
Proposed changes (including videos or screenshots)
Improves the handling of errors and states on webrtc calls:
Issue(s)
VAI-147
Steps to test or reproduce
To cause a non-critical error:
In this scenario, the client will potentially generate hundreds of ICE candidates and it's likely at least one of them will trigger some error.
Further comments
Summary by CodeRabbit