ESQL: Add generative tests for LIMIT BY#144238
Conversation
There was a problem hiding this comment.
Pull request overview
Adds generative test support for the new LIMIT BY command in ESQL, which retains at most N rows per group defined by grouping key expressions.
Changes:
- New
LimitByGeneratorthat generates randomLIMIT N BY expr1, expr2, ...commands and validates output row counts per group - Registers
LimitByGeneratorinEsqlQueryGenerator's pipe command list - Blocks full-text functions after
LIMIT BY(same as afterLIMIT)
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| LimitByGenerator.java | New generator for LIMIT BY commands with per-group row count validation |
| EsqlQueryGenerator.java | Registers LimitByGenerator in the pipe commands list |
| FullTextFunctionGenerator.java | Disallows full-text functions after a LIMIT BY command |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
...ures/src/main/java/org/elasticsearch/xpack/esql/generator/command/pipe/LimitByGenerator.java
Show resolved
Hide resolved
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
| ) { | ||
| int limit = (int) commandDescription.context().get(LIMIT); | ||
|
|
||
| if (output.size() > previousOutput.size()) { |
There was a problem hiding this comment.
This check is too strict in general.
Consider a query like following:
FROM idx
| LIMIT 10
| WHERE foo > 5
| LIMIT BY bar
The result of the first LIMIT is completely non-deterministic, so the result of the WHERE filtering (also the number of returned records) could be different at every run.
It's not a problem with LIMIT BY, it's just that ES|QL without a sort is not fully deterministic.
There was a problem hiding this comment.
I believe the checks are correct as they're very lenient. They check that:
- LIMIT BY does not produce more rows than the previous one
- There's no more than rows per group
In neither case it's really checking values or strange expectations
There was a problem hiding this comment.
The problem is LIMIT BY does not produce more rows than the previous one.
I'll give you a practical example, bear with me:
dataset:
a | b
------------
foo | 1
foo | 2
bar | 3
bar | 4
previous:
FROM idx
| LIMIT 2
| WHERE b > 2
current:
FROM idx
| LIMIT 2
| WHERE b > 2
| LIMIT 2 BY a
Execution of previous
FROM idx:
a | b
------------
foo | 1
foo | 2
bar | 3
bar | 4
so
FROM idx | LIMIT 2:
a | b
------------
foo | 1
foo | 2
so
FROM idx | LIMIT 2 | WHERE b > 2:
a | b
------------
<----- no records
👆 this is previousOutput
Execution of current
FROM idx:
a | b
------------
bar | 3
bar | 4 <----- different execution, different order
foo | 1
foo | 2
so
FROM idx | LIMIT 2:
a | b
------------
bar | 3
bar | 4
so
FROM idx | LIMIT 2 | WHERE b > 2:
a | b
------------
bar | 3
bar | 4 <--------- we have two records now!
so
FROM idx | LIMIT 2 | WHERE b > 2 | LIMIT 2 BY a:
a | b
------------
bar | 3
bar | 4 <--------- we still have two records!
👆 this is output
Conclusion:
previousOutput.size()is 0output.size()is 2
but the results are formally correct.
There was a problem hiding this comment.
Oh my. Fixing it, thanks!
There was a problem hiding this comment.
Fixed! We'll ask for another review when the planning logic gets merged
| QuerySchema schema, | ||
| QueryExecutor executor | ||
| ) { | ||
| if (previousCommands.stream().anyMatch(cmd -> cmd.commandName().equals(SortGenerator.SORT))) { |
There was a problem hiding this comment.
Remove this after merging TOP-N BY
|
Pinging @elastic/es-analytical-engine (Team:Analytics) |
There was a problem hiding this comment.
Pull request overview
Adds generative-query support for the new ESQL LIMIT … BY … command in the QA random query generator, including validation that the output respects per-group limits, and updates a few generator internals to integrate the new command and normalize multi-word command rendering.
Changes:
- Add a new
LimitByGeneratorthat generatesLIMIT N BY …pipelines and validates per-group row counts. - Register
LimitByGeneratorin the pipeline command list and disallow full-text expressions afterLIMIT BY. - Normalize
StatsGeneratorcommand rendering for underscore-based command names and rename inline-stats’ command name constant.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| x-pack/plugin/esql/qa/testFixtures/src/main/java/org/elasticsearch/xpack/esql/generator/function/FullTextFunctionGenerator.java | Disallows full-text expressions after limit_by (and refactors comparisons to use command constants). |
| x-pack/plugin/esql/qa/testFixtures/src/main/java/org/elasticsearch/xpack/esql/generator/EsqlQueryGenerator.java | Adds LimitByGenerator to the set of pipe commands used by the random pipeline generator. |
| x-pack/plugin/esql/qa/testFixtures/src/main/java/org/elasticsearch/xpack/esql/generator/command/pipe/StatsGenerator.java | Renders underscore command names as spaced command text when building the query string. |
| x-pack/plugin/esql/qa/testFixtures/src/main/java/org/elasticsearch/xpack/esql/generator/command/pipe/LimitByGenerator.java | New generator + validator for LIMIT … BY … behavior. |
| x-pack/plugin/esql/qa/testFixtures/src/main/java/org/elasticsearch/xpack/esql/generator/command/pipe/InlineStatsGenerator.java | Changes inline-stats command identifier to inline_stats (with rendering handled by StatsGenerator). |
Comments suppressed due to low confidence (1)
x-pack/plugin/esql/qa/testFixtures/src/main/java/org/elasticsearch/xpack/esql/generator/command/pipe/InlineStatsGenerator.java:22
INLINE_STATSwas changed from "inline stats" to "inline_stats" (commandName), but there is existing QA logic that switches oncommand.commandName()using the spaced form (e.g.GenerativeRestTest.updateIndexMappedhascase "stats", "inline stats"). With this change, inline stats commands will no longer hit that branch and indexMapped propagation for inline stats will be wrong. Consider updating those consumers to accept "inline_stats" (or better, referenceInlineStatsGenerator.INLINE_STATS/ normalize by replacing '_' with ' ' consistently).
public class InlineStatsGenerator extends StatsGenerator {
public static final String INLINE_STATS = "inline_stats";
public static final CommandGenerator INSTANCE = new InlineStatsGenerator();
@Override
public String commandName() {
return INLINE_STATS;
}
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
...src/main/java/org/elasticsearch/xpack/esql/generator/function/FullTextFunctionGenerator.java
Show resolved
Hide resolved
...ures/src/main/java/org/elasticsearch/xpack/esql/generator/command/pipe/LimitByGenerator.java
Outdated
Show resolved
Hide resolved
…improve LimitBy validation of columns
ncordon
left a comment
There was a problem hiding this comment.
LGTM, left minor comments only
...ures/src/main/java/org/elasticsearch/xpack/esql/generator/command/pipe/LimitByGenerator.java
Outdated
Show resolved
Hide resolved
| int limit = (int) commandDescription.context().get(LIMIT_CONTEXT); | ||
|
|
||
| if (limit == 0 && output.isEmpty() == false) { | ||
| return new ValidationResult(false, "LIMIT 0 BY should return no rows, got [" + output.size() + "]"); |
There was a problem hiding this comment.
We are not checking expectSameColumns in this case. Should we?
There was a problem hiding this comment.
If any of them fail, we'll report a failure. We can check both things and make a composed message, but I think it's not worth it here (?)
* upstream/main: Mute org.elasticsearch.xpack.esql.expression.function.aggregate.FirstDocIdGroupingAggregatorFunctionTests testSimple elastic#145923 Reindex relocation: store source TaskResult at destination node (elastic#145488) Bump versions after 9.2.8 release [CI] DLMFrozenTransitionServiceTests testCheckForFrozenIndicesReturnsEarlyWhenCapacityExhausted failing [elastic#145778] (elastic#145906) Update branches.json for 9.2.8 release ESQL: Clarify inheriting from Attributes (elastic#145898) Bump versions after 9.3.3 release Update branches.json for 9.3.3 release Prune changelogs after 8.19.14 release Bump versions after 8.19.14 release Update branches.json for 8.19.14 release [ML] Call old inference API (elastic#145690) ESQL: Unmute CsvIT sumWithOverflowRow (elastic#145893) Index a document when testing runtime fields shadowing dimensions & metrics (elastic#145882) [TEST] Fix version check in testSequenceNumbersDisabled (elastic#145879) [ESQL] Per-file filter pushdown awareness (elastic#145755) Unmute testGetReindexFollowsRelocation (elastic#145841) Correctly ignore system indices when validating dot-prefixed indices (elastic#128868) [Transform] Remove tests for deleted code (elastic#145685) ESQL: Add generative tests for LIMIT BY (elastic#144238)
Generative tests generators for LIMIT BY, which was added in elastic#144069 and elastic#144279
Generative tests generators for LIMIT BY, which was added in #144069 and #144279