[ESQL] Adds planner implementation for LIMIT BY#144069
[ESQL] Adds planner implementation for LIMIT BY#144069ncordon merged 32 commits intoelastic:mainfrom
Conversation
77aaad1 to
851d414
Compare
|
Pinging @elastic/es-analytical-engine (Team:Analytics) |
2487f15 to
5753f59
Compare
There was a problem hiding this comment.
Pull request overview
This PR introduces the initial end-to-end plumbing for the new ESQL command LIMIT N BY expr1, expr2, ..., adding parsing support, logical/physical plan nodes, optimizer rules, and compute-engine planning to execute grouped limiting (without supporting SORT directly before LIMIT BY yet).
Changes:
- Add new logical (
LimitBy) and physical (LimitByExec) plan nodes, including (de)serialization and planner wiring. - Extend the parser/grammar to recognize
LIMIT ... BY ...(dev-version gated) and add verification to rejectSORTdirectly beforeLIMIT BY. - Add logical/physical optimizer rules and tests to handle expression groupings, pruning foldable group keys, and combining/pushing down
LIMIT BY.
Reviewed changes
Copilot reviewed 40 out of 41 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/parser/EsqlBaseParser.g4 | Grammar: adds optional BY ... group key to LIMIT command (dev build gated). |
| x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/parser/LogicalPlanBuilder.java | Parser: builds LimitBy logical node when BY is present. |
| x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plan/logical/LimitBy.java | New logical plan node representing grouped limit and its execution-location hints. |
| x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plan/physical/LimitByExec.java | New physical plan node for grouped limit, including row-size estimation and serialization. |
| x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/planner/mapper/Mapper.java | Planner mapping: wires LimitBy -> LimitByExec and injects local limits into fragments under LimitBy. |
| x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/planner/mapper/LocalMapper.java | Local planner mapping for LimitBy -> LimitByExec. |
| x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/planner/PlannerUtils.java | Reduction planning: skips “local” pipeline breakers when selecting reduction boundaries. |
| x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/planner/LocalExecutionPlanner.java | Compute planning: adds GroupedLimitOperator planning for LimitByExec; refactors attribute-channel lookup. |
| x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/optimizer/LogicalPlanOptimizer.java | Adds new LIMIT BY optimizer rules to the optimization pipeline. |
| x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/optimizer/rules/logical/ReplaceLimitByExpressionWithEval.java | New rule: extracts non-attribute groupings into synthetic Eval + wraps with Project. |
| x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/optimizer/rules/logical/PruneLiteralsInLimitBy.java | New rule: drops foldable grouping keys and degenerates to plain LIMIT when all keys are foldable. |
| x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/optimizer/rules/logical/PushDownAndCombineLimitBy.java | New rule: combines adjacent LimitBy, pushes/duplicates across certain unary/binary nodes. |
| x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/analysis/Verifier.java | Adds verification: blocks SORT directly before LIMIT BY (temporary restriction). |
| x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/core/expression/Expressions.java | Adds listSemanticEquals utility for comparing grouping lists. |
| x-pack/plugin/esql/qa/testFixtures/src/main/resources/limit.csv-spec | QA coverage: adds LIMIT BY spec tests behind limit_by capability. |
| x-pack/plugin/esql/src/test/java/... | Adds extensive unit tests for parser, verifier, optimizer rules, and physical planning/serialization. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
...a/org/elasticsearch/xpack/esql/optimizer/rules/logical/ReplaceLimitByExpressionWithEval.java
Outdated
Show resolved
Hide resolved
...a/org/elasticsearch/xpack/esql/optimizer/rules/logical/ReplaceLimitByExpressionWithEval.java
Outdated
Show resolved
Hide resolved
...c/main/java/org/elasticsearch/xpack/esql/optimizer/rules/logical/PruneLiteralsInLimitBy.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/core/expression/Expressions.java
Show resolved
Hide resolved
...ava/org/elasticsearch/xpack/esql/optimizer/rules/logical/PushDownAndCombineLimitByTests.java
Outdated
Show resolved
Hide resolved
18378a7 to
41d3029
Compare
| FROM employees | ||
| | WHERE emp_no IN (10001, 10002, 10003, 10005, 10006) | ||
| | SORT emp_no | ||
| | LIMIT 1000 |
There was a problem hiding this comment.
Do we need this limit here and in other cases? I believe we have much fewer records in employees, especially above condition.
There was a problem hiding this comment.
Is it to avoid sorts over the entire result set?
There was a problem hiding this comment.
Technically no, as the default "limit 1000" automatically added. We can remove it
There was a problem hiding this comment.
Maybe it is to avoid warning about default limit?
I do not think it is suppressed by new limit by, is it?
There was a problem hiding this comment.
It's not, so yes, it could be for the warning 👀
There was a problem hiding this comment.
We want stable results so we have to use SORT, but we cannot use SORT + LIMIT BY yet because we need a different logical and physical plan for that, so it's disallowed. The intermediate LIMIT is just temporary. I've added removing it as a task to do in the future so we don't forget: https://github.com/elastic/esql-planning/issues/262
x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/analysis/VerifierTests.java
Outdated
Show resolved
Hide resolved
| FROM employees | ||
| | WHERE emp_no IN (10001, 10002, 10003, 10005, 10006) | ||
| | SORT emp_no | ||
| | LIMIT 1000 |
There was a problem hiding this comment.
It's not, so yes, it could be for the warning 👀
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/core/expression/Expressions.java
Outdated
Show resolved
Hide resolved
...ain/java/org/elasticsearch/xpack/esql/optimizer/rules/logical/PushDownAndCombineLimitBy.java
Show resolved
Hide resolved
...a/org/elasticsearch/xpack/esql/optimizer/rules/logical/ReplaceLimitByExpressionWithEval.java
Outdated
Show resolved
Hide resolved
...a/org/elasticsearch/xpack/esql/optimizer/rules/logical/ReplaceLimitByExpressionWithEval.java
Outdated
Show resolved
Hide resolved
...gin/esql/src/test/java/org/elasticsearch/xpack/esql/optimizer/LogicalPlanOptimizerTests.java
Show resolved
Hide resolved
...in/esql/src/test/java/org/elasticsearch/xpack/esql/optimizer/PhysicalPlanOptimizerTests.java
Outdated
Show resolved
Hide resolved
...in/esql/src/test/java/org/elasticsearch/xpack/esql/optimizer/PhysicalPlanOptimizerTests.java
Outdated
Show resolved
Hide resolved
35d28e0 to
98d0f9a
Compare
We cannot remove the check in this PR yet. The planner to enable SORT before is the pr we should review after #144279. I've updated the description. |
b3dac22 to
31ffe90
Compare
## Summary https://github.com/user-attachments/assets/a2fc7656-935d-4b62-88cc-05418d321b6b - The BY part of the LIMIT command is currently flagged as development-only, so it needs to be hidden - A mysterious voice says that LIMIT BY cannot come before SORT, so we added a validation and temporarily do not suggest it elastic/elasticsearch#144069 --------- Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com> Co-authored-by: Stratou <efstratia.kalafateli@elastic.co>
--------- Co-authored-by: Iván Cea Fontenla <ivancea96@outlook.com>
## Summary https://github.com/user-attachments/assets/a2fc7656-935d-4b62-88cc-05418d321b6b - The BY part of the LIMIT command is currently flagged as development-only, so it needs to be hidden - A mysterious voice says that LIMIT BY cannot come before SORT, so we added a validation and temporarily do not suggest it elastic/elasticsearch#144069 --------- Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com> Co-authored-by: Stratou <efstratia.kalafateli@elastic.co>
Generative tests generators for LIMIT BY, which was added in elastic#144069 and elastic#144279
This is part of the new
LIMIT BYESQL command.LIMIT N BY expr1, expr2,...retains at most N documents per group, where groups are defined by one or more grouping key expressions.It introduces a new
LimitBylogical plan node andLimitByExecphysical plan node. It doesn't modify existingLimitandLimitExecnodes.This PR knits the parser changes and the planner changes for the compute side,
GroupedLimitOperatorthat was added in #143458.This means it enables queries like this:
but we have restricted using a
SORTbefore theLIMIT BYuntil #143476 and #144279 have been merged. We have also done this for cleanliness because otherwise this PR would be even bigger. This means queries like this one would fail:Part of #112918, https://github.com/elastic/esql-planning/issues/238