-
Notifications
You must be signed in to change notification settings - Fork 459
CDRIVER-6182 break resume loop after two timeouts #2194
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CDRIVER-6182 break resume loop after two timeouts #2194
Conversation
|
|
||
| ctx.is_command = is_command; | ||
| matches = match_bson_with_ctx(doc, pattern, &ctx); | ||
| bson_t empty = BSON_INITIALIZER; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Drive-by fix to support NULL for doc and matches the comment above this function:
A NULL doc or NULL json_pattern means "{}".
9fd63a6 to
df213b8
Compare
eramongodb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor suggestions; otherwise, LGTM.
|
|
||
| bson_error_t error; | ||
| bson_t error_doc; /* always initialized, and set with server errors. */ | ||
| bson_t error_doc; /* always initialized, and set with server errors. */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| bson_t error_doc; /* always initialized, and set with server errors. */ | |
| bson_t error_doc; // always initialized, and set with server errors. |
Local comment style consistency.
| if (err_doc) { | ||
| resumable = _is_resumable_error(stream, err_doc); | ||
| if (stream->cursor->had_stream_timeout) { | ||
| iteration_timeout_count++; | ||
| } | ||
| } else { | ||
| resumable = false; | ||
| } | ||
|
|
||
| if (iteration_timeout_count >= 2) { | ||
| // CDRIVER-6182: Do not resume if two iteration timeouts occur. Intended to avoid a possible resume loop | ||
| // when `aggregate` succeeds but `getMore` consistently times out. | ||
| resumable = false; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| if (err_doc) { | |
| resumable = _is_resumable_error(stream, err_doc); | |
| if (stream->cursor->had_stream_timeout) { | |
| iteration_timeout_count++; | |
| } | |
| } else { | |
| resumable = false; | |
| } | |
| if (iteration_timeout_count >= 2) { | |
| // CDRIVER-6182: Do not resume if two iteration timeouts occur. Intended to avoid a possible resume loop | |
| // when `aggregate` succeeds but `getMore` consistently times out. | |
| resumable = false; | |
| } | |
| if (err_doc) { | |
| if (stream->cursor->had_stream_timeout) { | |
| iteration_timeout_count++; | |
| } | |
| // CDRIVER-6182: Do not resume if two iteration timeouts occur. Intended to avoid a possible resume loop | |
| // when `aggregate` succeeds but `getMore` consistently times out. | |
| if (iteration_timeout_count >= 2) { | |
| resumable = false; | |
| } else { | |
| resumable = _is_resumable_error(stream, err_doc); | |
| } | |
| } else { | |
| resumable = false; | |
| } |
Suggest tweaking the layout of blocks as suggested to consistently ensure resumable is set once the if (err_doc) block is evaluated by moving each resumable = ... to the end of every possible branch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One more suggestion: perhaps the CDRIVER-6182 branch should include a MONGOC_WARNING to notify users when this scenario is triggered?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest tweaking the layout
This led me to realize the if(err_doc) check was redundant. This is only entered if mongoc_cursor_error_document previously returned true. mongoc_cursor_error_document guarantees:
If the function returns true and
replyis not NULL, thenreplyis set to a pointer to a BSON document, which is either empty or the server’s error response.
Changed if (err_doc) to BSON_ASSERT (err_doc) and also applied the suggested tweaking to use if/else so resumable is only set once.
One more suggestion
I like that idea. Added a warning.
|
|
||
| ENTRY; | ||
|
|
||
| cursor->had_stream_timeout = false; // Reset before running next command. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest moving further below just before the if (parts.assembled.session) block to group together with cursor-state-modifying operations, e.g. cursor->client_session = ...;, cursor->explicit_session = ...;, etc.
kevinAlbs
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Latest changes applies skip_if_high_server_runtime_variance from #2195
| if (err_doc) { | ||
| resumable = _is_resumable_error(stream, err_doc); | ||
| if (stream->cursor->had_stream_timeout) { | ||
| iteration_timeout_count++; | ||
| } | ||
| } else { | ||
| resumable = false; | ||
| } | ||
|
|
||
| if (iteration_timeout_count >= 2) { | ||
| // CDRIVER-6182: Do not resume if two iteration timeouts occur. Intended to avoid a possible resume loop | ||
| // when `aggregate` succeeds but `getMore` consistently times out. | ||
| resumable = false; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest tweaking the layout
This led me to realize the if(err_doc) check was redundant. This is only entered if mongoc_cursor_error_document previously returned true. mongoc_cursor_error_document guarantees:
If the function returns true and
replyis not NULL, thenreplyis set to a pointer to a BSON document, which is either empty or the server’s error response.
Changed if (err_doc) to BSON_ASSERT (err_doc) and also applied the suggested tweaking to use if/else so resumable is only set once.
One more suggestion
I like that idea. Added a warning.
Summary
Break the change stream resume loop after two socket timeouts.
Background & Motivation
DRIVERS-1404 describes the problem this PR intends to address:
This scenario has known impact (HELP-83560). SERVER-48526 may help address this in the server. This PR proposes a driver solution: break after two timeouts in the resume loop. This is similarly suggested in DRIVERS-1309:
To propagate a timeout from
mongoc_stream_t, a bool is set onmongoc_server_stream_t.mongoc_stream_tis destroyed after a network error in _handle_network_error, so it cannot later be checked withmongoc_stream_timed_out. This is similarly propagated inmongoc_cursor_t, which only retrieves amongoc_server_streamfor the duration of a command.Rejected alternative: bound resume loop by
socketTimeoutMSAnother idea considered was to bound the resume loop by the duration
socketTimeoutMS. Bounding bysocketTimeoutMSmay better align withtimeoutMS:However, I think it further removes
socketTimeoutMSfrom the documented behavior "to send or receive on a socket". This idea was rejected.