Skip to content

[ESQL] Add a BY clause to CHANGE_POINT command#145210

Open
darius-vil wants to merge 30 commits intoelastic:mainfrom
darius-vil:changepoint-limit-by
Open

[ESQL] Add a BY clause to CHANGE_POINT command#145210
darius-vil wants to merge 30 commits intoelastic:mainfrom
darius-vil:changepoint-limit-by

Conversation

@darius-vil
Copy link
Copy Markdown
Contributor

@darius-vil darius-vil commented Mar 30, 2026

This introduces a BY clause to CHANGE_POINT: https://www.elastic.co/docs/reference/query-languages/esql/commands/change-point

The most complicated change is in ChangePointOperator, which now has the added complexity of keeping track of group changes in the sorted input.

@darius-vil darius-vil changed the title Changepoint limit by [ESQL] Add a BY clause to CHANGE_POINT command Mar 31, 2026
// Group A: [0×15, 1×15] -> step at row 15 of page0
// Group B: [1×15, 0×15] -> step at row 15 of page1
// Group C: [0×15, 1×15] -> step at row 15 of page2
List<Long> valuesColumn = Stream.of(
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are somewhat hard to read, but I really wanted some tests around groups spanning across pages...

changePointConfiguration
: ON key=qualifiedName
| AS targetType=qualifiedName COMMA targetPvalue=qualifiedName
| {this.isDevVersion()}? BY grouping=qualifiedName
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume {this.isDevVersion()}? is needed since LimitBy has it and we depend on it.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update: looks like LimitBy is about to remove it: #145225
Should we follow suit?


/**
* Enables the feature LIMIT n BY expr1, expr2 for retaining at most n docs per group.
* The feature will not work if we had SORT | LIMIT n BY
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No longer relevant:#144279

@darius-vil darius-vil added >enhancement Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) labels Mar 31, 2026
@darius-vil darius-vil added Team:ML Meta label for the ML team >feature :Analytics/ES|QL AKA ESQL and removed >enhancement labels Mar 31, 2026
@darius-vil darius-vil marked this pull request as ready for review March 31, 2026 11:18
@darius-vil darius-vil requested a review from a team as a code owner March 31, 2026 11:18
@elasticsearchmachine elasticsearchmachine removed the Team:ML Meta label for the ML team label Mar 31, 2026
@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Hi @darius-vil, I've created a changelog YAML for you.


```esql
CHANGE_POINT value [ON key] [AS type_name, pvalue_name]
CHANGE_POINT value [ON key] [BY group] [AS type_name, pvalue_name]
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@darius-vil we must keep version changes relevant for all users on 9.x

this change is only relevant to 9.4+ so we need to tag it appropriately with applies_to tags

See this comment for pointers on that : #144300 (comment)

Copy link
Copy Markdown
Member

@leemthompo leemthompo Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could keep the old one in a 9.whatever-9.foo tab and add a new one in a 9.4+ versioned tab (AKA applies switch)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing this out!

I meant to ask how do these work, but it completely slipped my mind in the end... I'll fix this in a sec

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the time being, I marked all the new switches/sections/inline changes with stack: preview 9.4 and serverless: preview - I'm not sure what the correct values should be.

Since CHANGE_POINT BY hinges on recently introduced LIMIT BY, I guess our release lifecycle should follow theirs?

I'll keep this comment open until I find the answer

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool you'll definitely need stack: ga|preview 9.4 but if it's ga you won't need any serverless tags :)

Attribute key,
Attribute targetType,
Attribute targetPvalue,
Attribute grouping
Copy link
Copy Markdown
Contributor

@jan-elastic jan-elastic Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about multiple groupings (CHANGE_POINT ... BY host, day, etc)) or general expressions (BY n+1, LENGTH(host))

I think this should be a List<Expression> comparable to LimitBy.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Never considered multiple groups, my original proposal has always been CHANGE_POINT v [ON t] [AS k, l] [BY g] instead of CHANGE_POINT v [ON t] [AS k, l] [BY g1 [, g2 [, ... [, gN]]]].

I should have probably guessed though 😄 Thanks for pointing this out!

Luckily, this doesn't complicate things that much. This includes two major changes:

  1. change in grammar and the boilerplate around it
  2. tracking when changepoint needs to be invoked is the complicated part, but thanks to reusing
    public class GroupKeyEncoder implements Accountable, Releasable {
    this becomes trivial

Regarding supporting Expressions, I got them working by following the same path as LimitBy - they have a custom optimizer that replaces expressions with attributes:

public final class ReplaceLimitByExpressionWithEval extends OptimizerRules.OptimizerRule<LimitBy> {
I copied this approach, it works, but I am unsure whether it's the only and correct way of achieving this yet.

changePointConfiguration
: ON key=qualifiedName
| AS targetType=qualifiedName COMMA targetPvalue=qualifiedName
| {this.isDevVersion()}? BY grouping=qualifiedName
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be (BY grouping=fields)? (see also StatsCommand)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, not sure. Why do you think ()? is necessary here?

Notice that the actual changePointCommand is a few lines above

changePointCommand
    : CHANGE_POINT value=qualifiedName (changePointConfiguration)*
    ;

that's where changePointConfiguration is used and it's wrapped in ()*, meaning any rules in changePointConfiguration can appear zero or more times. It's later verified in LogicalPlanBuilder that any changePointConfiguration rule appears at most 1 time.

May be helpful here: https://github.com/antlr/antlr4/blob/dev/doc/parser-rules.md#subrules

;


detect step change by group not nulled
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add test for:

  • multiple grouping (by a,b,c)
  • expression grouping (by a+1, "hi")

// Grouping must be sortable
if (grouping != null) {
type = grouping.dataType();
if (DataType.isSortable(type) == false) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see why this must be sortable.

Is that the case with LimitBy as well?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thinking at the time of writing was:

ChangePoint invokes LimitBy with the expectation that it sorts our input by grouping(s), which means grouping(s) has to be sortable and this is the place to verify whether grouping(s) is sortable?

I think if we don't do this here, it would simply fail in OrderBy with a similar cause:

, but maybe it's worth keeping this check in ChangePoint anyway for the more precise fail message? CHANGE_POINT grouping only supports sortable values, found expression

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LimitBy doesn't necessarily sorts the input by grouping(s) (as far as I know). It just groups by it. You can do that without sorting, e.g. by using a hash table.

* data that is passed to it, runs the change point detector on the data (which
* is a compute-heavy process), and then outputs all data with the change points.
*/
public class ChangePointOperator extends CompleteInputCollectorOperator {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not correct anymore:

ChangePointOperator used to be a CompleteInputCollectorOperator, because it would collect 1001 rows at max.

The new changepoint by, on the other hand, can get unlimited amounts of data, and needs to be streaming.

private void createOutputPages() {
List<Double> values = new ArrayList<>();
List<Integer> bucketIndexes = new ArrayList<>();
ArrayDeque<DetectedChangePoint> detectedChangePoints = new ArrayDeque<>();
Copy link
Copy Markdown
Contributor

@jan-elastic jan-elastic Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see why we need this.

The way I imagine this working:

  • process addInput(Page page) calls until you hit a row with group key != previous group key
  • different group key triggers detecting the change point in the group
  • create the output for the group

Copy link
Copy Markdown
Contributor

@jan-elastic jan-elastic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I gave this a first pass, see comments. I think there's some functionality missing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/ES|QL AKA ESQL >feature Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v9.5.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants