-
Notifications
You must be signed in to change notification settings - Fork 62
feat: Client side metrics support for mutateRows #1638
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
fe5b621
f74cee8
7b45ceb
bf9a52f
20167af
21f96e9
689efad
9d40bf7
2bb1a59
079a607
168e132
cdbdc20
ee12cea
d56ddf3
9b6ed02
bc70978
76e50e7
5022a7a
9aa6ce1
04e4e77
524eb47
285651f
42366c9
1514d0c
c1f273f
3c05e26
e4d2c28
33f449a
926c480
a82647b
4d74ab4
a0de09b
bbea829
9b13fba
f83727b
1594239
eb3fefe
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -341,7 +341,33 @@ Please use the format 'prezzy' or '${instance.name}/tables/prezzy'.`); | |
| (a, b) => a.concat(b), | ||
| [], | ||
| ); | ||
| const collectMetricsCallback = ( | ||
| originalError: ServiceError | null, | ||
| err: ServiceError | PartialFailureError | null, | ||
| apiResponse?: google.protobuf.Empty, | ||
| ) => { | ||
| // originalError is the error that was sent from the gapic layer. The | ||
| // compiler guarantees that it contains a code which needs to be | ||
| // provided when an operation is marked complete. | ||
| // | ||
| // err is the error we intend to send back to the user. Often it is the | ||
| // same as originalError, but in one case we construct a | ||
| // PartialFailureError and send that back to the user instead. In this | ||
| // case, we still need to pass the originalError into the method | ||
| // because the PartialFailureError doesn't have a code, but we need to | ||
| // communicate a code to the metrics collector. | ||
| // | ||
| const code = originalError ? originalError.code : 0; | ||
| metricsCollector.onOperationComplete(code); | ||
| callback(err, apiResponse); | ||
| }; | ||
|
|
||
| const metricsCollector = | ||
| this.bigtable._metricsConfigManager.createOperation( | ||
| MethodName.MUTATE_ROWS, | ||
| StreamingState.STREAMING, | ||
| this, | ||
| ); | ||
| /* | ||
| The following line of code sets the timeout if it was provided while | ||
| creating the client. This will be used to determine if the client should | ||
|
|
@@ -387,18 +413,23 @@ Please use the format 'prezzy' or '${instance.name}/tables/prezzy'.`); | |
| const onBatchResponse = (err: ServiceError | null) => { | ||
| // Return if the error happened before a request was made | ||
| if (numRequestsMade === 0) { | ||
| callback(err); | ||
| collectMetricsCallback(err, err); | ||
| return; | ||
| } | ||
|
|
||
| const timeoutExceeded = !!( | ||
| timeout && timeout < new Date().getTime() - callTimeMillis | ||
| ); | ||
| if (isRetryable(err, timeoutExceeded)) { | ||
| // If the timeout or max retries is exceeded or if there are no | ||
| // pending indices left then the client doesn't retry. | ||
| // Otherwise, the client will retry if there is no error or if the | ||
| // error has a retryable status code. | ||
| const backOffSettings = | ||
| options.gaxOptions?.retry?.backoffSettings || | ||
| DEFAULT_BACKOFF_SETTINGS; | ||
| const nextDelay = getNextDelay(numRequestsMade, backOffSettings); | ||
| metricsCollector.onAttemptComplete(err ? err.code : 0); | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. we should know that err exists at this point, right? It seems like error code 0 should never happen If we need a null check, I'd say we should put it beside
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
No. In fact if the error doesn't exist then most of the time this if block will be entered because
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ok, I know this is out of scope of this PR, but some comments here would be very useful. A docstring for isRetryable, and/or a comment at the start of this block describing what makes it retryable would help a lot There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The more important question though is: Do we record the attempt as successful if some mutations failed? or do we pass through the error code of a failed mutation? In Python, it looks like I was attaching the status code of the exception at the top of the error list. I can't remember if that came from the product team or not though. Have you discussed this with Mattie? Or looked at Java? We should verify this before moving forward
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Gemini says we record OK for PartialFailureError results in java. I'll verify that in the code next and ask Bigtable, but I am not too familiar with that codebase so may take time.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. And just to provide more background, I verified that stream ending (no error) corresponds to the case where we would expect a PartialFailureError this test. So in this case, if we are inside this if block and the err is null then we can conclude that we are retrying because some of the indicies are pending. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. After discussing with Mattie, she said Java records a status of success when a mutation fails, but the rpc succeeds. So we can do the same here |
||
| setTimeout(makeNextBatchRequest, nextDelay); | ||
| return; | ||
| } | ||
|
|
@@ -411,7 +442,10 @@ Please use the format 'prezzy' or '${instance.name}/tables/prezzy'.`); | |
|
|
||
| const mutationErrors = Array.from(mutationErrorsByEntryIndex.values()); | ||
| if (mutationErrorsByEntryIndex.size !== 0) { | ||
| callback(new PartialFailureError(mutationErrors, err)); | ||
| collectMetricsCallback( | ||
| err, | ||
| new PartialFailureError(mutationErrors, err), | ||
| ); | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. it also seems weird to me to expect a null error here
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This err ? err.code : 0 code has now been moved inside collectMetricsCallback anyways to reduce the repetition. |
||
| return; | ||
| } | ||
| if (err) { | ||
|
|
@@ -425,13 +459,15 @@ Please use the format 'prezzy' or '${instance.name}/tables/prezzy'.`); | |
| .filter(index => !mutationErrorsByEntryIndex.has(index)) | ||
| .map(() => err), | ||
| ); | ||
| callback(err); | ||
| collectMetricsCallback(err, err); | ||
| return; | ||
| } | ||
| callback(err); | ||
| collectMetricsCallback(null, null); | ||
| }; | ||
|
|
||
| metricsCollector.onOperationStart(); | ||
| const makeNextBatchRequest = () => { | ||
| metricsCollector.onAttemptStart(); | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. these scattered onAttemptStart and onAttemptComplete lines make me a bit nervous. Especially since we have to repeat them across rpcs. Is there a way to encapsulate these better? Maybe wrap the stream object itself? And wrap the callbacks? Let me know if this makes sense to you. I can try to come up with some ideas if needed
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Some ideas:
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks, those changes look a lot better. I think there are still improvements to be had, but this is a great start |
||
| const entryBatch = entries.filter((entry: Entry, index: number) => { | ||
| return pendingEntryIndices.has(index); | ||
| }); | ||
|
|
@@ -469,14 +505,16 @@ Please use the format 'prezzy' or '${instance.name}/tables/prezzy'.`); | |
| options.gaxOptions, | ||
| ); | ||
|
|
||
| this.bigtable | ||
| .request<google.bigtable.v2.MutateRowsResponse>({ | ||
| const requestStream = | ||
| this.bigtable.request<google.bigtable.v2.MutateRowsResponse>({ | ||
| client: 'BigtableClient', | ||
| method: 'mutateRows', | ||
| reqOpts, | ||
| gaxOpts: options.gaxOptions, | ||
| retryOpts, | ||
| }) | ||
| }); | ||
| metricsCollector.wrapRequest(requestStream); | ||
| requestStream | ||
| .on('error', (err: ServiceError) => { | ||
| onBatchResponse(err); | ||
| }) | ||
|
|
||

There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a comment describing the difference between originalError and err?
The refactor looks better. But I don't completely understand that part
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. I added comment fully explaining this.