Skip to content

Conversation

@hendrikmuhs
Copy link

@hendrikmuhs hendrikmuhs commented Jul 11, 2019

Rewrites how continuous data frame transforms calculates and handles buckets that require an update. Instead of storing the whole set in memory it pages through the updates using a 2nd cursor. This lowers not only memory consumption but prevents problems with limits at query time (max_terms_count). Apart from that the list of updates can be re-retrieved in a failure case (#43662)

Todo:

  • disable/enable BWC, adapt versions
  • cleanup, re-visit naming
  • test correctness
  • check behavior for histogram only pivots

@elasticmachine
Copy link
Collaborator

Pinging @elastic/ml-core

Copy link

@droberts195 droberts195 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like good progress.

These are just a few minor comments from an initial read through.

@hendrikmuhs hendrikmuhs force-pushed the df-continuous-terms-robustness branch from 9af4a41 to c10729d Compare July 11, 2019 18:55
@hendrikmuhs hendrikmuhs marked this pull request as ready for review July 12, 2019 10:16
@hendrikmuhs
Copy link
Author

run elasticsearch-ci/1

this.inProgressOrLastCheckpoint = inProgressOrLastCheckpoint;
this.lastCheckpoint = lastCheckpoint;
this.nextCheckpoint = nextCheckpoint;
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The constructor doesn't set runState, so it starts off null. This will cause an NPE if any switch (runState) is called while it's null.

Would it be appropriate to call runState = determineRunStateAtStart(); in the constructor? Or if that would cause problems then consider changing:

    switch (runState) {
        case x:
            return x();
        case y:
            return y();
        case z:
            return z();
        default:
            logger.warn("wrong");
            throw error;
    }

to:

    if (runState != null) {
        switch (runState) {
            case x:
                return x();
            case y:
                return y();
            case z:
                return z();
        }
    }
    logger.warn("wrong");
    throw error;

It would give us more information if runState still was null at the time of a switch.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

onStart gets called before any of the reads of runState happen, so an NPE should not be possible.

Because determineRunStateAtStart() depends on steps before, it would not return the same value if moved to the constructor.

However, I will assign a default to runState at the constructor, to be on the safe side.

Good spot!

@droberts195
Copy link

run elasticsearch-ci/packaging-sample

Copy link

@droberts195 droberts195 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@hendrikmuhs hendrikmuhs merged commit d4ec21b into elastic:master Jul 12, 2019
hendrikmuhs pushed a commit to hendrikmuhs/elasticsearch that referenced this pull request Jul 12, 2019
…lastic#44219)

Rewrites how continuous data frame transforms calculates and handles buckets that require an update. Instead of storing the whole set in memory, it pages through the updates using a 2nd cursor. This lowers memory consumption and prevents problems with limits at query time (max_terms_count). The list of updates can be re-retrieved in a failure case (elastic#43662)
hendrikmuhs pushed a commit to hendrikmuhs/elasticsearch that referenced this pull request Jul 12, 2019
…lastic#44219)

Rewrites how continuous data frame transforms calculates and handles buckets that require an update. Instead of storing the whole set in memory, it pages through the updates using a 2nd cursor. This lowers memory consumption and prevents problems with limits at query time (max_terms_count). The list of updates can be re-retrieved in a failure case (elastic#43662)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants