-
Notifications
You must be signed in to change notification settings - Fork 25.6k
[ML][Transform] reset failure count when a transform aggregation page is handled successfully #76355
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ML][Transform] reset failure count when a transform aggregation page is handled successfully #76355
Conversation
… is handled successfully
|
Pinging @elastic/ml-core (Team:ML) |
| if (bulkResponse.hasFailures() == false) { | ||
| // We don't know the of failures that have occurred (searching, processing, indexing, etc.), | ||
| // but if we search, process and bulk index then we have | ||
| // successfully processed an entire page of the transform and should reset the counter, even if we are in the middle | ||
| // of a checkpoint | ||
| context.resetReasonAndFailureCounter(); | ||
| nextPhase.onResponse(bulkResponse); | ||
| return; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, this diff is a pain. The only difference here is the resetReasonAndFailureCounter and the comments. I created a new method here so that this is testable.
| this.auditor = transformServices.getAuditor(); | ||
| this.transformConfig = ExceptionsHelper.requireNonNull(transformConfig, "transformConfig"); | ||
| this.progress = progress != null ? progress : new TransformProgress(); | ||
| this.progress = transformProgress != null ? transformProgress : new TransformProgress(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, this seemed like a bug, progress was assigning to itself, which was never set? We should set it to the passed value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was broken in #75459, which is a 7.15 change, and explains why it hasn't caused a bug report from a user yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😌
droberts195
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
💔 Backport failed
To backport manually run: |
… is handled successfully (elastic#76355) Failure count should not only be reset at checkpoints. Checkpoints could have many pages of data. Consequently, we should reset the failure count once we handle a single composite aggregation page. This way, the transform won't mark itself as failed erroneously when it has actually succeeded searches + indexing results within the same checkpoint. closes elastic#76074
… is handled successfully (elastic#76355) Failure count should not only be reset at checkpoints. Checkpoints could have many pages of data. Consequently, we should reset the failure count once we handle a single composite aggregation page. This way, the transform won't mark itself as failed erroneously when it has actually succeeded searches + indexing results within the same checkpoint. closes elastic#76074
…n page is handled successfully (#76355) (#76365) * [ML][Transform] reset failure count when a transform aggregation page is handled successfully (#76355) Failure count should not only be reset at checkpoints. Checkpoints could have many pages of data. Consequently, we should reset the failure count once we handle a single composite aggregation page. This way, the transform won't mark itself as failed erroneously when it has actually succeeded searches + indexing results within the same checkpoint. closes #76074 * fixing compilation
…on page is handled successfully (#76355) (#76366) * [ML][Transform] reset failure count when a transform aggregation page is handled successfully (#76355) Failure count should not only be reset at checkpoints. Checkpoints could have many pages of data. Consequently, we should reset the failure count once we handle a single composite aggregation page. This way, the transform won't mark itself as failed erroneously when it has actually succeeded searches + indexing results within the same checkpoint. closes #76074 * fixing tests * fixing tests
Failure count should not only be reset at checkpoints. Checkpoints could have many pages of data. Consequently, we should reset the failure count once we handle a single composite aggregation page.
This way, the transform won't mark itself as failed erroneously when it has actually succeeded searches + indexing results within the same checkpoint.
closes #76074