-
Notifications
You must be signed in to change notification settings - Fork 25.7k
[ML-DataFrame] Rewrite continuous logic to prevent terms count limit #44219
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ML-DataFrame] Rewrite continuous logic to prevent terms count limit #44219
Conversation
|
Pinging @elastic/ml-core |
droberts195
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like good progress.
These are just a few minor comments from an initial read through.
...src/main/java/org/elasticsearch/xpack/core/dataframe/transforms/DataFrameTransformState.java
Outdated
Show resolved
Hide resolved
...el/src/main/java/org/elasticsearch/client/dataframe/transforms/DataFrameIndexerPosition.java
Outdated
Show resolved
Hide resolved
...rc/main/java/org/elasticsearch/xpack/core/dataframe/transforms/DataFrameIndexerPosition.java
Outdated
Show resolved
Hide resolved
9af4a41 to
c10729d
Compare
|
run elasticsearch-ci/1 |
| this.inProgressOrLastCheckpoint = inProgressOrLastCheckpoint; | ||
| this.lastCheckpoint = lastCheckpoint; | ||
| this.nextCheckpoint = nextCheckpoint; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The constructor doesn't set runState, so it starts off null. This will cause an NPE if any switch (runState) is called while it's null.
Would it be appropriate to call runState = determineRunStateAtStart(); in the constructor? Or if that would cause problems then consider changing:
switch (runState) {
case x:
return x();
case y:
return y();
case z:
return z();
default:
logger.warn("wrong");
throw error;
}
to:
if (runState != null) {
switch (runState) {
case x:
return x();
case y:
return y();
case z:
return z();
}
}
logger.warn("wrong");
throw error;
It would give us more information if runState still was null at the time of a switch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
onStart gets called before any of the reads of runState happen, so an NPE should not be possible.
Because determineRunStateAtStart() depends on steps before, it would not return the same value if moved to the constructor.
However, I will assign a default to runState at the constructor, to be on the safe side.
Good spot!
|
run elasticsearch-ci/packaging-sample |
droberts195
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…lastic#44219) Rewrites how continuous data frame transforms calculates and handles buckets that require an update. Instead of storing the whole set in memory, it pages through the updates using a 2nd cursor. This lowers memory consumption and prevents problems with limits at query time (max_terms_count). The list of updates can be re-retrieved in a failure case (elastic#43662)
…lastic#44219) Rewrites how continuous data frame transforms calculates and handles buckets that require an update. Instead of storing the whole set in memory, it pages through the updates using a 2nd cursor. This lowers memory consumption and prevents problems with limits at query time (max_terms_count). The list of updates can be re-retrieved in a failure case (elastic#43662)
Rewrites how continuous data frame transforms calculates and handles buckets that require an update. Instead of storing the whole set in memory it pages through the updates using a 2nd cursor. This lowers not only memory consumption but prevents problems with limits at query time (max_terms_count). Apart from that the list of updates can be re-retrieved in a failure case (#43662)
Todo: