-
Notifications
You must be signed in to change notification settings - Fork 25.8k
[ESQL] Add a BY clause to CHANGE_POINT command #145210
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
darius-vil
wants to merge
30
commits into
elastic:main
Choose a base branch
from
darius-vil:changepoint-limit-by
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
30 commits
Select commit
Hold shift + click to select a range
0307f52
Add BY to grammar
darius-vil 081a557
Ad csv tests
darius-vil 769fc78
Implement CHANGE_POINT BY
darius-vil 7379341
Add column sortable check
darius-vil 16db485
Fix factory describe
darius-vil b2f0161
Apply spotless
darius-vil a70e5b9
Introduce nullable groupingChannel
darius-vil e14cccb
Refactor methods
darius-vil 294dec4
Fix edge case
darius-vil 057d781
Collect changepoints in a queue
darius-vil 254a3f6
Add tests around changepoints split across pages
darius-vil 59a30a3
Fix bug and add more unit tests
darius-vil bed2254
Merge branch 'main' into changepoint-limit-by
darius-vil f75395c
Add BY parser tests after merge
darius-vil 8233ba2
Fix empty input and nulls in the middle group
darius-vil decb5a9
Refactor unit test
darius-vil 6dbd9cd
Remove TODOs
darius-vil 8113031
[CI] Auto commit changes from spotless
a1d9975
Count value limit per group
darius-vil d4cdc6b
Update docs
darius-vil 1ac27e9
Minor fixes
darius-vil d66805c
Fix test checkstyle
darius-vil 862fbae
[CI] Auto commit changes from spotless
5d2337a
Fix typo in test
darius-vil 7a45883
Update docs/changelog/145210.yaml
darius-vil 583c302
Add versioning to docs
darius-vil f1a0a57
Merge branch 'main' into changepoint-limit-by
darius-vil f8f57b6
Add CHANGE_POINT BY generative test
darius-vil 2a15f9a
Merge branch 'main' into changepoint-limit-by
darius-vil fead1ff
[CI] Auto commit changes from spotless
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Some comments aren't visible on the classic Files Changed page.
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,5 @@ | ||
| area: ES|QL | ||
| issues: [] | ||
| pr: 145210 | ||
| summary: Add a BY clause to CHANGE_POINT command | ||
| type: feature |
16 changes: 16 additions & 0 deletions
16
.../_snippets/commands/examples/change_point.csv-spec/changePointForDocsByGroup.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,16 @@ | ||
| % This is generated by ESQL's CommandDocsTests. Do not edit it. See ../README.md for how to regenerate it. | ||
|
|
||
| ```esql | ||
| ROW k=[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25], g=[0,1,2] | ||
| | MV_EXPAND k | ||
| | MV_EXPAND g | ||
| | EVAL value=CASE(k<13, 0, 42) | ||
| | CHANGE_POINT value ON k BY g | ||
| | WHERE type IS NOT NULL | ||
| ``` | ||
|
|
||
| | k:integer | g:integer | value:integer | type:keyword | pvalue:double | | ||
| | --- | --- | --- | --- | --- | | ||
| | 13 | 0 | 42 | step_change | 0.0 | | ||
| | 13 | 1 | 42 | step_change | 0.0 | | ||
| | 13 | 2 | 42 | step_change | 0.0 | |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -8,6 +8,7 @@ | |
| package org.elasticsearch.compute.operator; | ||
|
|
||
| import org.apache.lucene.util.BytesRef; | ||
| import org.elasticsearch.common.Strings; | ||
| import org.elasticsearch.compute.data.Block; | ||
| import org.elasticsearch.compute.data.BlockFactory; | ||
| import org.elasticsearch.compute.data.BlockUtils; | ||
|
|
@@ -25,6 +26,7 @@ | |
| import java.util.ArrayList; | ||
| import java.util.Deque; | ||
| import java.util.List; | ||
| import java.util.Objects; | ||
|
|
||
| /** | ||
| * Find spikes, dips and change point in a list of values. | ||
|
|
@@ -37,29 +39,35 @@ public class ChangePointOperator extends CompleteInputCollectorOperator { | |
| private static final Logger logger = LogManager.getLogger(ChangePointOperator.class); | ||
| public static final int INPUT_VALUE_COUNT_LIMIT = 1000; | ||
|
|
||
| public record Factory(int channel, WarningSourceLocation source) implements OperatorFactory { | ||
| private record DetectedChangePoint(int index, ChangeType type) {} | ||
|
|
||
| public record Factory(int channel, Integer groupingChannel, WarningSourceLocation source) implements OperatorFactory { | ||
|
|
||
| @Override | ||
| public Operator get(DriverContext driverContext) { | ||
| return new ChangePointOperator(driverContext, channel, source); | ||
| return new ChangePointOperator(driverContext, channel, groupingChannel, source); | ||
| } | ||
|
|
||
| @Override | ||
| public String describe() { | ||
| return "ChangePointOperator[channel=" + channel + "]"; | ||
| return groupingChannel == null | ||
| ? Strings.format("ChangePointOperator[channel=%d]", channel) | ||
| : Strings.format("ChangePointOperator[channel=%d, groupingChannel=%d]", channel, groupingChannel); | ||
| } | ||
| } | ||
|
|
||
| private final DriverContext driverContext; | ||
| private final int channel; | ||
| private final Integer groupChannel; | ||
| private final WarningSourceLocation source; | ||
|
|
||
| private final Deque<Page> outputPages; | ||
| private Warnings warnings; | ||
|
|
||
| public ChangePointOperator(DriverContext driverContext, int channel, WarningSourceLocation source) { | ||
| public ChangePointOperator(DriverContext driverContext, int channel, Integer groupingChannel, WarningSourceLocation source) { | ||
| this.driverContext = driverContext; | ||
| this.channel = channel; | ||
| this.groupChannel = groupingChannel; | ||
| this.source = source; | ||
|
|
||
| outputPages = new ArrayDeque<>(); | ||
|
|
@@ -71,25 +79,67 @@ public boolean canProduceMoreDataWithoutExtraInput() { | |
| return outputPages.isEmpty() == false; | ||
| } | ||
|
|
||
| private void createOutputPages() { | ||
| int valuesCount = 0; | ||
| for (Page page : inputPages) { | ||
| valuesCount += page.getPositionCount(); | ||
| } | ||
| boolean tooManyValues = valuesCount > INPUT_VALUE_COUNT_LIMIT; | ||
| if (tooManyValues) { | ||
| valuesCount = INPUT_VALUE_COUNT_LIMIT; | ||
| } | ||
| private ChangeType detectChangePoint(List<Double> values, List<Integer> bucketIndexes) { | ||
| MlAggsHelper.DoubleBucketValues bucketValues = new MlAggsHelper.DoubleBucketValues( | ||
| null, | ||
| values.stream().mapToDouble(Double::doubleValue).toArray(), | ||
| bucketIndexes.stream().mapToInt(Integer::intValue).toArray() | ||
| ); | ||
| ChangeType changeType = ChangePointDetector.getChangeType(bucketValues); | ||
| return changeType; | ||
| } | ||
|
|
||
| List<Double> values = new ArrayList<>(valuesCount); | ||
| List<Integer> bucketIndexes = new ArrayList<>(valuesCount); | ||
| private void createOutputPages() { | ||
| List<Double> values = new ArrayList<>(); | ||
| List<Integer> bucketIndexes = new ArrayList<>(); | ||
| ArrayDeque<DetectedChangePoint> detectedChangePoints = new ArrayDeque<>(); | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't see why we need this. The way I imagine this working:
|
||
| int valuesIndex = 0; | ||
| int currentGroupRowCount = 0; | ||
| Object previousGroupKey = groupChannel != null && inputPages.isEmpty() == false | ||
| ? BlockUtils.toJavaObject(inputPages.peek().getBlock(groupChannel), 0) | ||
| : null; | ||
|
|
||
| boolean hasNulls = false; | ||
| boolean hasMultivalued = false; | ||
| boolean hasIndeterminableChangePoint = false; | ||
| boolean tooManyValues = false; | ||
| boolean lastGroupHasRows = false; | ||
| String indeterminableChangePointReason = ""; | ||
| for (Page inputPage : inputPages) { | ||
| Block inputBlock = inputPage.getBlock(channel); | ||
| for (int i = 0; i < inputBlock.getPositionCount() && valuesIndex < valuesCount; i++) { | ||
| Block groupBlock = groupChannel != null ? inputPage.getBlock(groupChannel) : null; | ||
| for (int i = 0; i < inputBlock.getPositionCount(); i++) { | ||
| if (groupBlock != null) { | ||
| Object currentGroupKey = BlockUtils.toJavaObject(groupBlock, i); | ||
| if (Objects.equals(currentGroupKey, previousGroupKey) == false) { | ||
| if (values.isEmpty() == false || lastGroupHasRows) { | ||
| var changeType = detectChangePoint(values, bucketIndexes); | ||
| var changePointIndex = changeType.changePoint(); | ||
| if (changePointIndex >= 0) { | ||
| detectedChangePoints.offer(new DetectedChangePoint(changePointIndex, changeType)); | ||
| } | ||
| if (changeType instanceof ChangeType.Indeterminable indeterminable) { | ||
| hasIndeterminableChangePoint = true; | ||
| indeterminableChangePointReason = indeterminable.getReason(); | ||
| } | ||
| values.clear(); | ||
| bucketIndexes.clear(); | ||
| lastGroupHasRows = false; | ||
| } | ||
| previousGroupKey = currentGroupKey; | ||
| currentGroupRowCount = 0; | ||
| } | ||
| } | ||
|
|
||
| if (currentGroupRowCount >= INPUT_VALUE_COUNT_LIMIT) { | ||
| tooManyValues = true; | ||
| valuesIndex++; | ||
| continue; | ||
| } | ||
|
|
||
| Object value = BlockUtils.toJavaObject(inputBlock, i); | ||
| lastGroupHasRows = true; | ||
| currentGroupRowCount++; | ||
| if (value == null) { | ||
| hasNulls = true; | ||
| valuesIndex++; | ||
|
|
@@ -103,32 +153,47 @@ private void createOutputPages() { | |
| } | ||
| } | ||
|
|
||
| MlAggsHelper.DoubleBucketValues bucketValues = new MlAggsHelper.DoubleBucketValues( | ||
| null, | ||
| values.stream().mapToDouble(Double::doubleValue).toArray(), | ||
| bucketIndexes.stream().mapToInt(Integer::intValue).toArray() | ||
| ); | ||
| ChangeType changeType = ChangePointDetector.getChangeType(bucketValues); | ||
| int changePointIndex = changeType.changePoint(); | ||
| // flush last (or only) group; for "non-grouped" or "all-null" input this still | ||
| // runs to produce an "indeterminable" warning. | ||
| if (values.isEmpty() == false || groupChannel == null || lastGroupHasRows) { | ||
| var changeType = detectChangePoint(values, bucketIndexes); | ||
| var changePointIndex = changeType.changePoint(); | ||
| if (changePointIndex >= 0) { | ||
| detectedChangePoints.offer(new DetectedChangePoint(changePointIndex, changeType)); | ||
| } | ||
| if (changeType instanceof ChangeType.Indeterminable indeterminable) { | ||
| hasIndeterminableChangePoint = true; | ||
| indeterminableChangePointReason = indeterminable.getReason(); | ||
| } | ||
| } | ||
|
|
||
| buildOutputPages(detectedChangePoints); | ||
| emitWarnings(tooManyValues, hasNulls, hasMultivalued, hasIndeterminableChangePoint, indeterminableChangePointReason); | ||
| } | ||
|
|
||
| private void buildOutputPages(ArrayDeque<DetectedChangePoint> detectedChangePoints) { | ||
| BlockFactory blockFactory = driverContext.blockFactory(); | ||
| int pageStartIndex = 0; | ||
| while (inputPages.isEmpty() == false) { | ||
| Page inputPage = inputPages.peek(); | ||
| int pageEndIndex = pageStartIndex + inputPage.getPositionCount(); | ||
| Page outputPage; | ||
| Block changeTypeBlock = null; | ||
| Block changePvalueBlock = null; | ||
| boolean success = false; | ||
| try { | ||
| if (pageStartIndex <= changePointIndex && changePointIndex < pageStartIndex + inputPage.getPositionCount()) { | ||
| DetectedChangePoint head = detectedChangePoints.peek(); | ||
| if (head != null && head.index() < pageEndIndex) { | ||
| try ( | ||
| BytesRefBlock.Builder changeTypeBlockBuilder = blockFactory.newBytesRefBlockBuilder(inputPage.getPositionCount()); | ||
| DoubleBlock.Builder pvalueBlockBuilder = blockFactory.newDoubleBlockBuilder(inputPage.getPositionCount()) | ||
| ) { | ||
| for (int i = 0; i < inputPage.getPositionCount(); i++) { | ||
| if (pageStartIndex + i == changePointIndex) { | ||
| changeTypeBlockBuilder.appendBytesRef(new BytesRef(changeType.getWriteableName())); | ||
| pvalueBlockBuilder.appendDouble(changeType.pValue()); | ||
| if (head != null && pageStartIndex + i == head.index()) { | ||
| changeTypeBlockBuilder.appendBytesRef(new BytesRef(head.type().getWriteableName())); | ||
| pvalueBlockBuilder.appendDouble(head.type().pValue()); | ||
| detectedChangePoints.poll(); | ||
| head = detectedChangePoints.peek(); | ||
| } else { | ||
| changeTypeBlockBuilder.appendNull(); | ||
| pvalueBlockBuilder.appendNull(); | ||
|
|
@@ -152,33 +217,33 @@ private void createOutputPages() { | |
|
|
||
| inputPages.removeFirst(); | ||
| outputPages.add(outputPage); | ||
| pageStartIndex += inputPage.getPositionCount(); | ||
| pageStartIndex = pageEndIndex; | ||
| } | ||
| } | ||
|
|
||
| if (changeType instanceof ChangeType.Indeterminable indeterminable) { | ||
| if (logger.isDebugEnabled()) { | ||
| logger.debug("Change point indeterminable: {}", indeterminable.getReason()); | ||
| } | ||
| warnings(false).registerException(new IllegalArgumentException(indeterminable.getReason())); | ||
| } | ||
| private void emitWarnings( | ||
| boolean tooManyValues, | ||
| boolean hasNulls, | ||
| boolean hasMultivalued, | ||
| boolean hasIndeterminableChangePoint, | ||
| String indeterminableReason | ||
| ) { | ||
| if (tooManyValues) { | ||
| if (logger.isDebugEnabled()) { | ||
| logger.debug("Too many values: limit is {}, some values were ignored", INPUT_VALUE_COUNT_LIMIT); | ||
| } | ||
| logger.debug(() -> Strings.format("Too many values: limit is %d, some values were ignored", INPUT_VALUE_COUNT_LIMIT)); | ||
| warnings(true).registerException( | ||
| new IllegalArgumentException("too many values; keeping only first " + INPUT_VALUE_COUNT_LIMIT + " values") | ||
| ); | ||
| } | ||
| if (hasIndeterminableChangePoint) { | ||
| logger.debug(() -> Strings.format("Change point indeterminable: %s", indeterminableReason)); | ||
| warnings(false).registerException(new IllegalArgumentException(indeterminableReason)); | ||
| } | ||
| if (hasNulls) { | ||
| if (logger.isDebugEnabled()) { | ||
| logger.debug("Values contain nulls; skipping them"); | ||
| } | ||
| logger.debug(() -> "Values contain nulls; skipping them"); | ||
| warnings(true).registerException(new IllegalArgumentException("values contain nulls; skipping them")); | ||
| } | ||
| if (hasMultivalued) { | ||
| if (logger.isDebugEnabled()) { | ||
| logger.debug("Values contain multivalued entries; skipping them"); | ||
| } | ||
| logger.debug(() -> "Values contain multivalued entries; skipping them"); | ||
| warnings(true).registerException( | ||
| new IllegalArgumentException( | ||
| "values contains multivalued entries; skipping them (please consider reducing them with e.g. MV_AVG or MV_SUM)" | ||
|
|
@@ -209,7 +274,11 @@ protected void onClose() { | |
|
|
||
| @Override | ||
| public String toString() { | ||
| return "ChangePointOperator[channel=" + channel + "]"; | ||
| if (groupChannel == null) { | ||
| return "ChangePointOperator[channel=" + channel + "]"; | ||
| } else { | ||
| return "ChangePointOperator[channel=" + channel + ", groupChannel=" + groupChannel + "]"; | ||
| } | ||
| } | ||
|
|
||
| private Warnings warnings(boolean onlyWarnings) { | ||
|
|
||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@darius-vil we must keep version changes relevant for all users on 9.x
this change is only relevant to 9.4+ so we need to tag it appropriately with applies_to tags
See this comment for pointers on that : #144300 (comment)
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you could keep the old one in a
9.whatever-9.footab and add a new one in a9.4+versioned tab (AKA applies switch)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pointing this out!
I meant to ask how do these work, but it completely slipped my mind in the end... I'll fix this in a sec
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the time being, I marked all the new switches/sections/inline changes with
stack: preview 9.4andserverless: preview- I'm not sure what the correct values should be.Since CHANGE_POINT BY hinges on recently introduced LIMIT BY, I guess our release lifecycle should follow theirs?
I'll keep this comment open until I find the answer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cool you'll definitely need
stack: ga|preview 9.4but if it'sgayou won't need any serverless tags :)